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Chapter 1 
Introduction 



The NVAX CPU is a high-performance, single-chip implementation of the VAX architecture. It 
is partitioned into multiple sections which cooperate to execute the VAX base instruction group. 
The CPU chip includes the first levels of the memory subsystem hierarchy in an on-chip virtual 
instruction cache and an on-chip physical instruction and data cache, as well as the controller 
for a large second-level cache implemented in static RAMs on the CPU module. 

1 .1 Scope and Organization of this Specification 

This specification describes the operation of the NVAX CPU chip. It contains a description of the 
interface to the chip, an overview of the operation of the instruction pipeline, and extensive detail 
about the functional operation of each section of the chip. In addition, the specification contains 
discussions of error handling, chip initialization, and testability features. 

1.2 Related Documents 

The following documents are related to or were used in the preparation of this document: 

• DEC Standard 032 VAX Architecture Standard. 

• NVAX CPU Chip Design Methodology. 

1 .3 Terminology and Conventions 

1 .3.1 Numbering 

All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other 
than decimal are indicated with the name of the base following the number in parentheses, e.g., 
FF (hex). 

1.3.2 UNPREDICTABLE and UNDEFINED 

RESULTS specified as UNPREDICTABLE may vary from moment to moment, implementation 
to implementation, and instruction to instruction within implementations. Software can never 
depend on results specified as UNPREDICTABLE. 
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OPERATIONS specified as UNDEFINED may vary from moment to moment, implementation to 
implementation, and instruction to instruction within implementations. The operation may vary 
in effect from nothing, to stopping system operation. UNDEFINED operations must not cause 
the processor to hang., i.e., reach a state from which there is no transition to a normal state in 
which the machine executes instructions. 

Note the distinction between result and operation. Non-privileged software can not invoke 
UNDEFINED operations. 

1.3.3 Ranges and Extents 

Ranges are specified by a pair of numbers separated by a and are inclusive, e.g., a range of 
integers 0..4 includes the integers 0, 1, 2, 3, and 4. 

Extents are specified by a pair of numbers in angle brackets separated by a colon and are inclusive, 
e.g., bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3. 

1.3.4 Must be Zero (MBZ) 

Fields specified as Must Be Zero (MBZ) must never be filled by software with a non-zero value. 
If the processor encounters a non-zero value in a field specified as MBZ, a Reserved Operand 
exception occurs. 

1.3.5 Should be Zero (SBZ) 

Fields specified as Should Be Zero (SBZ) should be filled by software with a zero value. These 
fields may be used at some future time. Non-zero values in SBZ fields produce UNPREDICTABLE 
results. 

1.3.6 Register Format Notation 

This specification contains a number of figures that show the format of various registers, followed 
by a description of each field. In general, the fields on the register are labeled with either a name 
or a mnemonic. The description of each field includes the name or mnemonic, the bit extent, 
and the type. An example of a register is shown in Figure 1—1. Table 1—1 is an example of the 
description of the fields in this register. 
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Register Format Example 



31 30 29 28|27 26 25 24|23 22 21 20|19 16 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04 | 03 02 01 00 
I (. (~- -+ (■— | — — | +__+_.. +__+_ — (.-—+--+—)—-+ — +~ 4— +— — I— -+—-+--.+— H +— H 1—— 

1100000001 FAUXT_CMD |xxxx|IE|0 0 0 0 0 0 0 0| I | | 

)— + + +— H + __ + __ + __h )._-+—+__+__+—+ + _» + __ + +-.-H 1- +--+«+— H +~ + +— 

I I I 

TRAP + | | 

INTERRUPT -+ | 
BOS ERROR + 



Tabie 1-1: Register Field Description Example 



Name 


Extent 


Type 


Description 


BUS.ERROR 


0 


WC,0 


The BUSJERROR bit is set when a bus error is detected. 


INTERRUPT 


1 


WC,0 


The INTERRUPT bit is set when an error that is reported as an 
interrupt is detected. 


TRAP 


2 


WC,0 


The TRAP bit is set when an error that is reported as a trap is detected. 


IE 


11 


RW0 


The IE bit enables error reporting interrupts. When IE is 0, interrupts 
are disabled. When IE is a 1, interrupts are enabled. 


FAULT.CMD 


23:16 


RO 


The FAULT_CMD field latches the command that was in progress when 
an error is detected. 



The 'Type" column in the field description includes both the actual type of the field, and an 
optional initialized value, separated from the type by a comma. The type denotes the functional 
operation of the field, and may be one of the values shown in Table 1—2. If present, the initialized 
value indicates that the field is initialized by hardware or microcode to the specified value at 
powerup. If the initialized value is not present, the field is not initialized at powerup. 



Tabie 1-2: Register Field Type Notation 



Notation Description 



RW A read-write bit or field. The value may be read and written by software, microcode, 

or hardware. 

RO A read-only bit or field. The value may be read by software, microcode, or hardware. 

It is written by hardware; software or microcode writes are ignored. 

WO A write-only bit or field. The value may be written by software or microcode. It is read 

by hardware and reads by software or microcode return an UNPREDICTABLE result. 

WZ A write-only bit or field. The value may be written by software or microcode. It is read 

by hardware and reads by software or microcode return a 0. 

WC A write-one-to-clear bit. The value may be read by software or microcode. Software or 

microcode writes of a 1 cause the bit to be cleared by hardware. Software or microcode 
writes of a 0 do not modify the state of the bit 

RC A read-to-clear field. The value is written by hardware and remains unchanged until 

read. The value may be read by software or microcode, at which point, hardware may 
write a new value into the field. 
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In addition to named fields in registers, other bits of the register may be labeled with one of the 
three symbols listed in Table 1-3. These symbols denote the type of the unnamed fields in the 
register. 

Table 1-3: Register Field Notation 

Notation Description 

0 A "0" in a bit position denotes a register bit that is read as a 0 and ignored on write. 

1 A "1" in a bit position denotes a register bit that is read as a 1 and ignored on write. 

z An Y in a bit position denotes a register bit that does not exist in hardware. The 

value is UNPREDICTABLE when read, and ignored on write. 
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1.3.7 Timing Diagram Notation 

This specification contains a number of timing diagrams that show the timing of various signals, 
including NDAL signals. The notation used in these timing diagrams is shown in Figure 1—2. 



Figure 1-2: Timing Diagram Notation 

HIGH 

LOW 

INTERMEDIATE 

VALID_HIGH_OR_LOW 

CHANGING >00000000( 

INVAL ID_BUT_NOT__CHANGING XXXXXX)55? 

HIGH_TO_LOW \\\\ 

E I GH_TO__VAL I D WW 

HI GH_TO_ INVALID WWXX 

INTERMED I ATE_TO__LOW uvv 

HIGH_TO_INTERMEDIATE > v 1 ^ 

LOW_TO_HIGH //// 

LOW_TO_VALID //// 

LOW_TO_ INVALID ////XX 

INTERMEDIATE_TO_HIGH e ( f ' 

LOW_TO__ INTERMED IATE J I I I 

VALID_TO__INTERMED IATE >>) 

INVALID_TO_INTERMEDIATE XXX) > ) 

INTERMED IATE__TO_VALID ( <<< 

INTERMEDIATE TO INVALID C < < 
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1.4 Revision History 



Table 1-4: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


15-Dec-1989 


Update for second-pass release. 
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2.1 Overview 

This chapter provides a summary of the VAX architectural features of the NVAX CPU Chip. It is 
not intended as a complete reference but rather to give an overview of the user-visible features. 
For a complete description of the architecture, consult the VAX Architecture Standard (DEC 
Standard 032). 

2.2 Visible State 

The visible state of the processor consists of memory, both virtual and physical, the general 
registers, the processor status longword (PSL), and the privileged internal processor registers 
CIPRs). 

2.2.1 Virtual Address Space 

The virtual address space is four gigabytes (2**32), separated into three accessable regions (P0, 
PI, and SO) and one reserved page , as shown in Figure 2-1. I 
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Figure 2-1: Virtual Address Space Layout 



00000000 



3FFFFFFF 
40000000 



80000000 




rsoo 



System 
Region 




length of P0 Region in 
pages (P0LR) 



P0 Region growth direction 
PI Region growth direction 



length of PI Region in 
pages (2**21-P1LR) 

length of System Region 
in pages (SLR) 



System Region growth 
direction 



NOTE 

NVAX CPU chips at revision 1 implement the original VAX memory management 
architecture in which any reference to a virtual address above BFFFFFFF (hex) causes 
a length violation. NVAX CPU chips at revision 2 or later implement the extended SO 
space addressing described above. 

2.2.2 Physical Address Space 

The NVAX CPU naturally generates 32-bit physical addresses. This corresponds to a four gigabyte 
physical address space as shown in Figure 2-2. Memory space occupies the first seven-eighths 
(3.56B) of the physical address space. I/O space occupies the last one-eighth (512MB) of the 
physical address space and can be distinguished from memory space by the met that bits <31:29> 
of the physical address are all ones. 



2-2 Architectural Summary 



DIGITAL CONFIDENTIAL 



Figure 2-2: 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 
32-bit Physical Address Space Layout 



DFFFFFFF 



EOO0OOO0 
FFFFFFFF 



Memory 
Space 



I/O 
Space 



3.5 GB 



I 512 MB 



In addition to the natural 32-bit physical address, the CPU may be configured to generate 30-bit 
physical addresses. In this mode, only 512MB of memory space can be referenced, as shown in 
Figure 2-3. 



Figure 2-3: 30-bit Physical Address Space Layout 



+ 



+ 



00000000 
1FFFFFFF 



Memory 
Space 



512 MB 



Inaccessable 
Region 



3.0 GB 



I + 



E0000000 | I/O 

FFFFFFFF | Space 

+ 



512 MB 



The translation from 30-bit addresses to 32-bit addresses is accomplished by sign-extending 
PA<29> to PA<31:30>. In this mode, the programmer sees a 1GB address space, split evenly 
between memory and I/O space, which is mapped to the actual 32-bit physical address space as 
shown in Table 2—1. Unless explicitly stated otherwise, addresses that are given in the remainder 
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of this specification are the full 32-bit addresses (which, of course, may have been generated from 
a 30-bit program address via the mapping shown). 



Table 2-1 : 30-bit Mapping of Program Addresses to 32-bit Hardware Addresses 



Program Address 


Hardware Address 


00000000.. 1FFFFFFF 


00000000.. 1FFFFFFF 


20000000..3FFFFFFF 


EO0000O0..FFFFFFFF 



2.2.2.1 Physical Address Control Registers 

During powerup, microcode configures the CPU to generate 30-bit physical addresses. Console 
firmware may then reconfigure the CPU and optional vector unit to generate either 30-bit or 
I 32-bit physical addresses by writing to the MODE bit in the PAMODE register. The PAMODE 

register is shown in Figure 2-4. 

Figure 2-4: IPR E7 (hex), PAMODE 



31 30 29 28|27 26 25 24|23 22 21 20119 18 17 16 1 15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

100000000000000000000000000000001 I : PAMODE 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+—+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I 

MODE — + 



The PAMODE register also determines how PTEs are to be interpreted. In 30-bit mode, PTEs 
are interpreted in 21-bit PFN format. In 32-bit mode, PTEs are interpreted in 25-bit PFN format 
(although the two upper bits of the PFN field are ignored). The different PTE formats are 
described in Section 2.6.4. 

The PAMODE register is described in more detail in Chapter 12. 

2.2.3 Registers 

There are 16 32-bit General Purpose Registers (GPRs). The format is shown in Figure 2-5, and 
the use of each GPR is shown in Table 2-2. 
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Figure 2-5: General Purpose Registers 



31 30 29 28127 26 25 24|23 22 21 20I1S 18 17 16|15 14 13 12 | 11 10 09 08107 06 05 04 | 03 02 01 00 



+ + + + n H + H + H + H + H + + + + + + -I + ^ + + + -t- + + H + H + 



Table 2-2: General Purpose Register Usage 

GPR Synonym Use 

R0-R11 General Purpose 

R12 AP Argument Pointer 

R13 FP Frame Pointer 

R14 SP Stack Pointer 

R15 PC Program Counter 



The Processor Status Longword (PSL) is a 32-bit register which contains processor state. The 
PSL format is shown in Figure 2-6, and the fields of the PSL are shown in Table 2—3. 

Figure 2-6: Processor Status Longword Fields 



31 30 29 28|27 26 25 24|23 22 21 20119 16 17 16|15 14 13 12 111 10 09 08107 06 05 04 | 03 02 01 00 

+ + +_-+« + r-~+ +--+—-+ +— H >—-+-—! +--+--+—+—+ 

! | | IMBIFPI | CUR | PRV |MB| | I I I I I I I I I 

|CM|TP|VM|Z ID I IS | MOD | MOD |2 | 1PL I MBZ |DV|FU|XV| T| N| Z| V| C| :PSL 



Table 2-3: 


Processor Status Longword 


Name 


Bit(s) 


Description 


CM 


31 


Compatabihty Mode 1 


TP 


30 


Trace Pending 


VM 


29 


Virtual Machine Mode 1 


FPD 


27 


First Part Done 


IS 


26 


Interrupt Stack 


CUR.MOD 


25:24 


Current Mode 


PRV.MOD 


23:22 


Previous Mode 


IPL 


20:16 


Interrupt Priority Level 


DV 


7 


Decimal Overflow Trap Enable 


FU 


6 


Floating Underflow Fault Enable 


IV 


5 


Integer Overflow Trap Enable 


T 


4 


Trace Trap Enable 



1 MBZ in current implementation 
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Tabie 2-3 (Cont.): Processor Status Longword 



Name Bit(s) Description 



N 


3 


Negative Condition Code 


Z 


2 


Zero Condition Code 


V 


1 


Overflow Condition Code 


c 


0 


Carry Condition Code 



2.3 Data Types 



The NVAX CPU supports nine data types: byte, word, longword, quadword, character string, 
variable length bit field, FjSoating, D_noating, and G_floating. These are summarized in 
Figure 2-7. 



Figure 2-7: Data Types 



07 06 05 04|03 02 01 00 
.— 4—4—. 4.— 4.-- 4.— r— 4—- + 



Data Type: Byte 
Length: 8 bits 

Use: Signed or unsigned integer 



15 14 13 12 111 10 OS 08 107 06 05 04|03 02 01 00 



Data Type: Word 
Length: 16 bits 

Use: Signed or unsigned integer 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 

I- +— +- -+--+- -+--+—+- -+—+--+—+--+—+- -.+—+--+- -+--4~-+--4— -+--4~-+--4— -4— -+--4^-+--+--+--+--+ 



:A 



Data Type: Longword 
Length: 32 bits 

Use: Signed or unsigned integer 



Figure 2-7 Cont'd on next page 



2-6 Architectural Summary 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Figure 2-7 (Cont.): Data Types 

31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04 | 03 02 01 00 

+ + + + + + + + + + + + + + + + h + 4 + + + + + h +~- •+ -I + + + 4 + 

I I :A 

+ + + + + + + +~+ H + H + -I + + 4 + + + — + + 4 h 4 -I + + + 4 + + + 

I I :A+4 

+ + 4 + + + + + + + + 4 + + + + 4 + + + + + + + + 4 + — + + + + + + 

Data Type: Quadword 

Length: 64 bits 

Use: Signed integer 

07 06 05 04 103 02 01 00 

+ y r—+ +--+ +— + + 

I I :A 

j — 4 1 — + __ + __ + «_ + __ + __ + 

i I :A+1 

H -r + + + + + ■! + 

H +— + h + +— i + 

I ! :A-rlength-l 

Data Type: Character String 
Length: 0-64K bytes 
Use: Byte string 

31 r-rS r-i-S-1 P P-l 00 

! i /////////////////////// I I :A 

j. 4. + + + + + + -r +— + + + + + + H H -r + -r + -i + + H H ■) -r 4 + + + 

Data Type: Variable length bit field 
Length: 0-32 bits 
Use: Bit string 

15 14 13 12 111 10 09 08 107 06 05 04 103 02 01 00 
+ + — + — + — + — + — + 4 + — + + — 4 + — + + + — + 

I s 1 exponent I fraction I :A 

+ + — 4-— 4 +--+ + +—+—+--+--+ 4 + K + 

I fraction I :A+2 

+ — + — 4 — + — + — + — + — + — + — + — + — 4 — + — + — + — + — 4 

31 30 29 28|27 26 25 24|23 22 21 20119 18 17 16 

Data Type: F_floating 

Length: 32 bits 

Use: Floating point 



15 14 13 12 111 10 09 08|07 06 05 04103 02 01 00 



+ + + + 4 + + 4 + H + -I + + + + + 

I s | exponent I fraction I :A 

+ +-«+—+—+-._+__+— +__+—+._+ H + 4 + h — ;+ 

I fraction | :A+2 

4 +——+—— +-—H + 1- h !—+--+ +--+--+ +--+--+ 

I fraction | :A+4 

+ + 4 + 4 + + 4 + 4 + 4 + 4 +. + + 

I fraction | :A+6 

4 + — + — + — + — + — + — 4 + — 4 + — +— + — 4 + V — + 



63 62 61 60|59 58 57 56|55 54 53 52151 50 49 48 

Data Type: D_floating 

Length: 64 bits 

Use: Floating point 



Figure 2-7 Cont'd on next page 
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15 14 13 12 111 10 09 08107 06 05 04|03 02 01 00 

+ + + + K + H + +~ + H + H + +— + 

I s | exponent. | fraction I :A 

+--+—+--+—+--+--+—»—-+—+—+—(—+—+--+--+—+ 

I fraction | :A+2 

+ +—+__+_- + — 4~— +--+--+--+--+-- 

! fraction I :A+4 

+ + + + 4 k + -| + + + 4 + + + + + 

I fraction | :A+6 

+ + * + + + + +• + h H H + H + + + 

63 62 61 60|59 58 57 56|55 54 53 52|51 50 49 48 

Data Type: G_floating 

Length: 64 bits 

Use: Floating point 



2.4 Instruction Formats and Addressing Modes 

VAX instructions consist of a one- or two-byte opcode, followed by zero to six operand specifiers. 

2.4.1 Opcode Formats 

An opcode may be either one or two contiguous bytes. The two-byte format begins with an FD 
(hex) byte and is followed by a second opcode byte. The one-byte format is indicated by an opcode 
byte whose value is anything other than FD (hex). The one- or two-byte opcode format is shown 
in Figure 2-8. 

Figure 2-8: Opcode Formats 



07 06 05 04 | 03 02 01 00 



H I -i t—- H +— I I + 

15 14 13 12111 10 09 08107 06 05 04|03 02 01 00 

+--.+ +—4 H I + I -I I t— — h H V- -+ 

Two— byte opcode: | opcode I FD | :A 

+--+_-+—+—+—+_-.+— +—+-_+— +__+—+—.+— +_-+—+ 



2.4.2 Addressing Modes 

An operand specifier starts with a specifier byte and may be followed by a specifier extension. 
Bits <3:0> of the specifier byte contain a GPR number and bits <7:4> of the specifier byte indicate 
the addressing mode of the specifier. If the register number in the specifier byte does not contain 
15, the addressing mode is a general register addressing mode. If the register number in the 
specifier byte does contain 15, the addressing mode is a PC-relative addressing mode. The 
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different addressing modes are shown graphically in Figure 2-9. General register addressing 
modes are listed in Table 2—4 and PC-relative addressing modes are listed in Table 2—5. 

Figure 2-9: Addressing Modes 

07 06 05 04|03 02 01 00 
General register +--+-- h — +— »+ — +--+—+ — + 
addressing mode: I mode I register I 

* — + — + — + — + — + — + — i- — + 

07 06 05 04 |03 02 01 00 

PC-relative + (■ — + — +~+ — \ 1 — + 

addressing mode: I mode 11 1 1 II 



DIGITAL CONFIDENTIAL 



Architectural Summary 2-9 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Table 2-4: General Register Addressing Modes 

Access 



Mode 


Name 


Assembler 


r m w a v 


PC 


SP 


Indexable? 


0-3 


literal 


S A #literal 


yffff 


x 


X 


f 


4 


index 


i[Rx] 


y y y yy 


u 


y 


f 


5 


register 


Rn 


y y y fy 


u 


uq 


f 


6 


register deferred 


(Rn) 


V V V V V 

J J J J J 


u 


v 

j 


v 

j 


7 


autodecrement 


-(Rn) 


y y y y y 

mJ *f *t V %t 


u 


y 


ux 


8 


autoincrement 


(Rn)+ 


V V V V V 

J J J J J 


D 
F 


v 

j 


ux 


9 


autoincrement deferred 


@(Rn)+ 


yyy yy 


P 


7 


ux 


A 


byte displacement 


B A d(Rn) 


y y y yy 


P 


y 


y 


B 


byte displacement deferred 


@B A d(Rn) 


y y y yy 


P 


y 


y 


C 


word displacement 


W A d(Rn) 


y y y yy 


P 


y 


y 


D 


word displacement 
deferred 


@W A d(Rn) 


y yy yy 


P 


y 


y 


E 


longword displacement 


L A d(Rn) 


y y y yy 


P 


y 


y 


F 


longword displacement 


®L A d(Rn) 


y yyyy 


P 


y 


y 



deferred 



Access Types 

r = read 

m = modify 

w = write 

a = address 

v = variable bit field 

Syntax 

i = any indexable address mode 
d = displacement 
Rn = general register, n = 0 to 15 
Rx s general register, n = 0 to 14 

Results 

y = yes, always valid address mode 
f = reserved addressing mode fault 
x - logically impossible 
p = program counter addressing 
u = unpredictable 

ud = unpredictable for destination of CALLG, CALLS, JMP and JSB 
uq = unpredictable for quad, D/G_floatmg and field if pos+size > 32 
ux = unpredictable if index register = base register 
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Table 2-5: PC-Relative Addressing Modes 



Access 

Mode Name Assembler r m w a v PC SP Indexable? 



8 


immediate 


I A #constant 


yuuyud 


u 


9 


absolute 


@#address 


yy yyy 


y 


A 


byte relative 


B A address 


yy y yy 


y 


B 


byte relative deferred 


@B A address 


yyy yy 


y 


C 


word relative 


W A address 


yy y yy 


y 


D 


word relative deferred 


@W A address 


yyy yy 


y 


E 


longword relative 


L A address 


yyyyy 


y 


F 


longword relative deferred 


@L A address 


yyy yy 


y 



For notation, refer to the key in Table 2-4 



2.4.3 Branch Displacements 

Branch instructions contain a one- or two-byte signed branch displacement after the final specifier 
(if any). The branch displacement is shown in Figure 2-10. 

Figure 2-10: Branch Displacements 



07 06 05 04 1 03 02 01 00 

Signed byte .+__+__ + — + 

displacement: I displacement I 
+~ + — +__+_-+ 

15 14 13 12 111 10 OS 08|07 06 05 04 | 03 02 01 00 
Signed word +--+--+■ — +—+--*—+--+--+—+--+—+--+—+—+— +— n 
displacement: | displacement | 

H + + + 1- + 1- H + + + ■) +— + H + 1 



2.5 Instruction Set 

The NVAX CPU supports the VAX Base Instruction Group as defined in DEC Standard 032. 
These instructions are listed in Table 2-6. 
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Table 2-6: NVAX Instruction Set 



Opcode 



Instruction 



N 



Exceptions 



Integer, Arithmetic and Logical Instructions 



58 



ADAWI add.ro, sum.mw 



10V 



80 
CO 
AO 



ADDB2 add.rb, sum-mb 
ADDL2 add.rl, Bum.ml 
ADDW2 add.rw, Bum.mw 



10V 

iov 
iov 



81 
CI 
Al 



ADDB3 addl.rb, add2.rb, sum.wb 
ADDL3 addl.rl, add2.rl, sum.wl 
ADDW3 addl.rw, add2.rw, sum.ww 



iov 
iov 
iov 



D8 



ADWC add.rl, sum.ml 



iov 



78 
79 



ASHL cnt.rb, srcrl, dst.wl 
ASHQ cnt.rb, srcrq, dst.wq 



iov 
iov 



8A 
CA 
AA 



BICB2 mask.rb, dst.mb 
BICL2 mask.rl, dst.ml 
BICW2 mask-rw, dst.mw 



8B 
CB 
AB 



BICB3 mask.rb, src.rb, dstwb 
BICL3 mask.rl, srcrl, dstwl 
BICW3 mask.rw, src.rw, dstww 



88 
C8 
A8 



BISB2 mask.rb, dst.mb 
BISL2 maskrl, dst.ml 
BISW2 mask.rw, dst.mw 



89 
C9 
A9 



BISB3 mask.rb, src.rb, dst.wb 
BISL3 mask.rl, srcrl, dstwl 
BISW3 mask.rw, srcrw, dst.ww 



93 
D3 
B3 



BITB mask.rb, srcrb 
BITL mask.rl, srcrl 
BITW mask.rw, srcrw 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Integer, Arithmetic and Logical Instructions 


94 


CLRB dstwb 


0 


1 


0 






D4 


CLRL{=F} dst.wl 


0 


1 


0 






7C 


CLRQ{=D=G} dst.wq 


0 


1 


0 






B4 


CLRW dst.ww 


0 


1 


0 






91 


CMPB srcl.rb, src2.rb 


* 


* 


0 






Dl 


CMPL srcl.rl, src2.rl 


* 


* 


0 


* 




Bl 


CMPW srcl.rw, src2.rw 


* 




0 


* 




98 


CVTBL srcrb, dst.wl 


* 


* 


0 


0 




99 


CVTBW srcrb, dst-ww 


* 


* 


0 


0 




F6 


CVTLB srcrl, dst.wb 


* 


* 


* 


0 


iov 


F7 


CVTLW srcrl, dst.ww 


* 


* 


* 


0 


iov 


33 


CVTWB srcrw, dst.wb 


* 


* 


* 


0 


iov 


32 


CVTWL srcrw, dst.wl 


* 


* 


0 


0 




97 


DECB dif.mb 


* 


* 




* 


iov 


D7 


DECL dif.ml 


* 


* 


* 


* 


iov 


B7 


DECW dif.mw 


* 


* 




* 


iov 


86 


DIVB2 divr.rb, qucmb 


* 


* 




0 


iov, idvz 


C6 


DIVL2 divr.rl, quo.ml 


* 


* 


* 


0 


iov, idvz 


A6 


DIVW2 divr.rw, quo.mw 


* 


* 


* 


0 


iov, idvz 


87 


DIVB3 divr.rb, divcLrb, quo.wb 


* 


* 


* 


0 


iov, idvz 


C7 


DIVL3 divr.rl, divd.rl, quo.wl 


* 


* 


* 


0 


iov, idvz 


A7 


DIVW3 divr.rw, divd.rw, quo.ww 


* 


* 


* 


0 


iov, idvz 


7B 


EDIV divr.rl, divcLrq, quo.wl, rem.wl 


* 


He 


* 


0 


iov, idvz 


7A 


EMUL mulr.rl, muld.rl, add.rl, prod.wq 


* 


* 


0 


0 




96 


INCB sum.mb 


* 


* 




* 


iov 


D6 


INCL sum.ml 


* 


* 


* 


He 


iov 
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Table 2-6 (Cont.): NVAX instruction Set 



Opcode 



Instruction. 



N 



Exceptions 



Integer, Arithmetic and Logical Instructions 



B6 



INCW sum.mw 



10V 



92 
D2 
B2 



MCOMB srcrb, dst.wb 
MCOML srcrl, dst.wl 
MCOMW srcrw, dst.ww 



8E 
CE 
AE 



MNEGB srcrb, dst.wb 
MNEGL srcrl, dst.wl 
MNEGW srcrw, dst.ww 



10V 

iov 
iov 



90 
DO 
7D 
BO 



MOVB srcrb, dstwb 
MOVL srcrl, dst.wl 
MOVQ srcrq, dst.wq 
MOVW srcrw, dst.ww 



9A 
9B 
3C 



MOVZBW srcrb, dst.wb 
MOVZBL srcrb, dst.wl 
MOVZWL srcrw, dst.wl 



84 
C4 
A4 



MULB2 mulr.rb, prod.mb 
MULL2 mulr.rl, prod.ml 
MULW2 mulr.rw, prod.mw 



iov 
iov 
iov 



85 
C5 
A5 



MULB3 mulr.rb, muld.rb, prod.wb 
MULL3 mulr.rl, muld.rl, prod.wl 
MULW3 mulr.rw, muld.rw, prod-ww 



iov 
iov 
iov 



DD 
9C 
D9 
82 



PUSHL srcrl, {-(SP).wl} 



ROTL cnt.rb, srcrl, dst.wl 



SBWC sub.rl, dif.ml 



SUBB2 sub.rb, dif.mb 



* * 0 



* * * * 



iov 



IOV 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


Z 


V 


c 


Exceptions 


Integer, Arithmetic and Logical Instructions 


C2 


SUBL2 sub.rl, dif.ml 


* 


* 


* 




iov 


A2 


SUBW2 sub.rw, dif.mw 






* 


* 


iov 


83 


SUBB3 sub.rb, min.rb, dif.wb 




* 


* 


* 


iov 


C3 


SUBL3 sub.rl, min.rl, dif.wl 


* 


* 


* 


* 


iov 


A3 


SUBW3 sub.rw, min.rw, dif.ww 


* 


* 


* 


* 


iov 


95 


TSTB srcrb 




* 


0 


0 




XJO 


1&1L SrC.ri 


* 


* 


u 


u 




B5 


TSTW src.rw 


* 


* 


0 


0 




8C 


XORB2 maskxb, dst.mb 


* 


* 


0 






CC 


XORL2 mask-rl, dst.ml 


* 


* 


0 






AC 


XORW2 mask.rw, dst.inw 


* 


* 


0 






8D 


XORB3 mask.rb, srcrb, dst.wb 


* 


* 


0 






CD 


XORL3 mask.rl, srcrl, dst.wl 


* 


* 


0 






AD 


XORW3 mask.rw, srcrw, dstww 




* 


0 






Address Instructions 


9E 


MOVAB srcab, dstwl 


* 


* 


0 






DE 


MOVAL{=F} sraal, dst.wl 


* 


* 


0 






7E 


MOVAQ{=D=G} srcaq, dstwl 


* 


* 


0 






3E 


MOVAW srcaw, dsfc-wl 


* 


* 


0 






9F 


PUSHAB srcab, {-(SP).wl} 


* 


* 


0 






DF 


PUSHAL{=F} srcal, {-(SP).wl} 


* 


* 


0 






7F 


PUSHAQj=D=G} srcaq, {-(SP).wl} 


* 


* 


0 






3F 


PUSHAW srcaw, {-(SP).wl} 


* 


* 


0 






Variable-Length Bit Field Instructions 


EC 


CMPV pos.rl, size.rb, base.vb, {field.rv}, srcrl 




* 


0 


* 


rsv 


ED 


CMPZV pos.rl, size.rb, base.vb, {field.rvj, srcrl 


* 


* 


0 




rsv 
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Table 2-6 (Cont.): NVAX Instruction Set 

Opcode Instruction N Z V C Exceptions 

Variable-Length Bit Field Instructions 



EE EXTV pos.rl, size.rb, base-vb, {field.rv}, dst.wl * * 0 - rsv 

EF EXTZV pos.rl, size.rb, base.vb, {field.rv}, dst.wl * * 0 - rsv 

F0 INSV src.rl, pos.rl, size.rb, base.vb, {field. wv} - - - - rsv 

EB FFC startpos.rl, size.rb, base.vb, {field.rv}, 0 * 0 0 rsv 
findpos.wl 

EA FFS startpos.rl, size.rb, base.vb, {field.rv}, 0 * 0 0 rsv 
findpos.wl 

Control Instructions 

9D ACBB limit-rb, add.rb, index.mb, displ.bw * * * - iov 

Fl ACBL limit-rl, add.rl, index.ml, dispLbw * * * - iov 

3D ACBW limit.rw, add.rw, index.mw, displ.bw * * * - iov 

F3 AOBLEQ limit.rl, indeacml, dispLbb * * * - iov 

F2 AOBLSS limit.rl, index.ml, dispLbb * * * - iov 

IE BCC{=BGEQU} dispLbb - 

IF BCS{=BLSSU} displ.bb - - - - 

13 BEQL{=BEQLU} displ.bb - 

18 BGEQ displ.bb - 

14 BGTR dispLbb - 
1A BGTRU dispLbb - 

15 BLEQ dispLbb - 
IB BLEQU dispLbb - 

19 BLSS dispLbb - 

12 BNEQ{=BNEQU} displ.bb - - - - 

1C BVC dispLbb - 

ID BVS dispLbb - - - - 

El BBC pos.rl, base.vb, dispLbb, {field.rv} - - - - rsv 

E0 BBS pos.rl, base.vb, dispLbb, {fielcLrv} - - - - rsv 
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Table 2-6 (Cont.): NVAX Instruction Set 

Opcode Instruction N Z V C Exceptions 

Control Instructions 

E5 BBCC pos.rl, base.vb, displ.bb, {field.mv} - - - - rsv 

E3 BBCS pos.rl, base.vb, dispLbb, {fielcLmv} - - - — rsv 

E4 BBSC pos.rl, base.vb, displ.bb, {field.mv} - - - - rsv 

E2 BBSS pos.rl, base.vb, dispLbb, {fiekLmv} - - - - rsv 

E7 BBCCI pos.rl, base.vb, dispLbb, {field.mv} rsv 

E6 BBSSI pos.rl, base.vb, displ.bb, {field.mv} rsv 

E9 BLBC srcrl, dispLbb - - - - 

E8 BLBS srcrl, displ.bb - 

11 BRB dispLbb - 

31 BRWdispLbw - 

10 BSBB dispLbb, {-(SP).wl} - 

30 BSBW dispLbw, {-(SP).wl} - 

8F CASEB selector-rb, base.rb, limit.rb, * * 0 * 

displ.bw-list 

CF CASEL selector.rl, base.rl, limit.rl, * * 0 * 

dispLbw-list 

AF CASEW selector-rw, base.rw, limit.rw, * * 0 * 

dispLbw-list 

17 JMPdst-ab - 

16 JSB dst.ab, {-(SP).wl} - 

05 RSB {(SP)+.rl} - 

F4 SOBGEQ index.ml, dispLbb * * * - iov 

F5 SOBGTR indeacml, dispLbb * * * - iov 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 



Instruction 



N Z V C Exceptions 



Procedure Call Instructions 



FA 



CALLG arglist-ab, dst.ab, {-(SP).w*} 



0 0 0 0 rsv 



FB 



CALLS numarg.rl, dst.ab, KSP).w*} 



0 0 0 0 rsv 



04 



RET KSP)+.r*} 



rsv 



Miscellaneous Instructions 



B9 
B8 



BICPSW mask.rw 
BISPSW mask.rw 



rsv 



rsv 



03 



BPT {-(KSP).w*) 



00 



HALT {-(KSP).w*} 



prv 



OA 



INDEX subscript.rl, low.rl, bigh.rl, size.rl, * * 0 0 sub 
indexin.rl, indexout-wl 



DC 



MOVPSL dstwl 



01 



NOP 



BA 
BB 



POPR mask.rw, KSP)+.r*} 
PUSHR mask.rw, {-(SP).w*} 



FC 



XFC {unspecified operands} 



Queue Instructions 



5C 
5D 
0E 



INSQHI entry.ab, header.aq 
INSQTI entry.ab, header.aq 
INSQUE entry.ab, pred.ab 



* 0 * 

* Q * 

* 0 * 



rsv 
rsv 



5E 
5F 
OF 



REMQHI header.aq, addr.wl 
REMQTI header.aq, addr.wl 
REMQUE entry.ab, addr.wl 



* * * 

* * * 

* * * 



rsv 
rsv 
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Table 2-6 (Cont.): NVAX instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Operating System Support Instructions 


BD 


CHME param.rw, {-(ySP).w*} 


0 


0 


0 


0 




BC 


CHMK param.rw, {-(ySP).w*} 


0 


0 


0 


0 




BE 


CHMS param-rw, {-(ySP).w*} 


0 


0 


0 


0 




BF 


CHMU param.rw, {-(ySP).w*} 


0 


0 


0 


0 




06 


LDPCTX {PCB.r*, -(KSP).w*} 


- 


- 


- 


- 


rsv, prv 


DB 


MFPR procreg.rl, dstwl 




* 


0 




rsv, prv 


DA 


MTPR srcrl, procreg.rl 


* 


* 


0 




rev, prv 


OC 


PROBER mode.rb, len.rw, base.ab 


0 


* 


0 






0D 


PROBEW mode.rb, len.rw, base.ab 


0 


* 


0 






02 


REI {(SP)+.r*} 




* 


* 




rsv 


07 


SVPCIX KSP)+.r*, PCB.w*} 










prv 


Character String Instructions 


29 


CMPC3 len.rw, srcladdr.ab, src2addr.ab 


* 


* 


0 






2D 


CMPC5 srcllen.rw, srcladdr.ab, 
fill.rb,src21en.rw, src2addr.ab 




* 


0 






3A 


LOCC char.rb, len.rw, addr.ab 


0 


* 


0 


0 




28 


MOVC3 len.rw, srcaddr.ab, dstaddr.ab, 
{R0-5.wl} 


0 


1 


0 


0 




2C 


MOVC5 srclen.rw, srcaddr.ab, filLrb, dstlen-rw, 
dstaddr.ab,{R0-5.wl) 


* 


* 


0 


* 




2A 


SCANC len-rw, addr.ab, tbladdr.ab, mask.rb 


0 


4c 


0 


0 
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Table 2-6 (Cont): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Character String Instructions 


3B 


SEPC char.rb, lerurw, addr.ab 


0 


* 


0 


0 




2B 


SPANC len.rw, addr.ab, tbladdr.ab, maskxb 


0 


* 


0 


0 




Floating Point Instructions 


60 


ADDD2 add.rd, sum.md 


* 


* 


0 


0 


rsv, fov, fuv 


40 


ADDF2 add.rf, sum.mf 


* 


* 


0 


0 


rsv, fov, fuv 


40FD 


ADDG2 add.rg, sum.mg 


* 


* 


0 


0 


rsv, fov, fuv 


61 


ADDD3 addLrd, add2.rd, sum.wd 


* 


* 


0 


0 


rsv, fov, fuv 


41 


ADDF3 addl.rf, add2.rf, sum.wf 


* 


* 


0 


0 


rsv, fov, fuv 


41FD 


ADDG3 addl.rg, add2.rg, sum.wg 


* 


* 


0 


0 


rsv, fov, fuv 


71 


CMPD srcl.rd, src2.rd 


* 


* 


0 


0 


rsv 


51 


CMPF srcl.rf, src2.rf 


* 


* 


0 


0 


rsv 


51FD 


CMPG srcl.rg, src2.rg 


* 


* 


0 


0 


rsv 


6C 


CVTBD srcrb, dst.wd 


* 


* 


0 


0 




4C 


CVTBF srcrb, dst.wf 


* 




0 


0 




4CFD 


CVTBG srcrb, dst.wg 


* 


* 


0 


0 




68 


CVTDB srcrd, dstwb 


* 


* 


* 


0 


rsv, iov 


76 


CVTDF srcrd, dstwf 


* 


* 


0 


0 


rsv, fov 


6A 


CVTDL srcrd, dst-wl 


* 


* 


* 


0 


rsv, iov 


69 


CVTDW srcrd, dstww 


* 


4e 




0 


rsv, iov 


48 


CVTFB srcrf, dst.wb 


* 


* 


* 


0 


rsv, iov 


56 


CVTFD srcrf, dst.wd 


* 


* 


0 


0 


rsv 


99FD 


CVTFG srcrf, dst.wg 


* 


* 


0 


0 


rsv 


4A 


CVTFL srcrf, dstwl 


* 


* 


* 


0 


rsv, iov 


49 


CVTFW srcrf, dst.ww 


* 


* 


4c 


0 


rsv, iov 


48FD 


CVTGB srcrg, dsLwb 


* 


* 


4c 


0 


rsv, iov 


33FD 


CVTGF srcrg, dst.wf 


# 


* 


0 


0 


rsv, fov, fuv 


4AFD 


CVTGL srcrg, dst.wl 


* 


* 


4c 


0 


rsv, iov 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Floating Point Instructions 


inMI \ 

49rD 


CVTGW srcrg, dst.ww 






♦ 


0 


rsv, iov 


6E 


CVTLD srcrl, dst.wd 


* 


* 


0 


0 




4E 


CVTLF srcrl, dst.wf 


* 


* 


0 


0 




4EFD 


CVTLG srcrl, dst.wg 


* 


* 


0 


0 






u v 1 wjj src.rw, asuwa 


* 




u 


u 




4D 


CVTWF src.rw, dstwf 


* 


* 


0 


0 




4DFD 


CVTWG srcrw, dstwg 


* 


* 


0 


0 




6B 


CVTRDL srcrd, dst.wl 




* 


* 


0 


rsv, iov 


415 


i/Vi xvr Li srcrl, ast. wi 


* 


* 


* 


u 


rsv, iov 


4BFD 


CVTRGL srcrg, dst.wl 


* 


* 


* 


0 


rsv, iov 


66 


DTVD2 divr.rd, quo.md 




* 


0 


0 


rsv, fov, fuv, fdvz 


46 


DIVF2 divr.rf, quo.mf 


* 


* 


0 


0 


rsv, fov, fuv, fdvz 


46FD 


DIVG2 divr.rg, quo.mg 


* 


* 


0 


0 


rsv, fov, fuv, fdvz 


o7 


DIVJUo aivr.rd, aivcLra., quo.wa 


* 


* 




0 


rsv, fov, fuv, fdvz 


47 


DIVF3 divr.rf, divd.rf, quo.wf 


* 


* 


0 


0 


rsv, fov, fuv, fdvz 


47FD 


DIVG3 divr.rg, divd.rg, quo.wg 


* 




0 


0 


rsv, fov, fuv, fdvz 




MJNiiiurU srcrd., asx.wo. 


* 


* 


u 


u 


rsv 


52 


MNEGF srcrf, dstwf 


* 




0 


0 


rsv 


52FD 


MNEGG srcrg, dst.wg 






0 


0 


rsv 


70 


MOVD srcrd, dst.wd 


* 




0 


- 


rsv 


50 


MOVF srcrf, dst.wf 




* 


0 




rsv 


50FD 


MOVG srcrg, dst.wg 


* 


* 


0 




rsv 


64 


MULD2 mulr.rd, prod.md 


* 


* 


0 


0 


rsv, fov, fuv 


44 


MULF2 mulr.rf, procLmf 


* 




0 


0 


rsv, fov, fuv 


44FD 


MULG2 mulr.rg, procLmg 


He 


* 


0 


0 


rsv, fov, fuv 


65 


MULD3 mulr.rd, muld.rd, prod.wd 


* 


* 


0 


0 


rsv, fov, fuv 


45 


MULFS mulr.rf, muld.rf, prod.wf 


* 


* 


0 


0 


rsv, fov, fuv 


45FD 


MULG3 mulr.rg, muld.rg, prod.wg 


* 


* 


0 


0 


rsv, fov, fuv 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


Z 


V 


c 


Exceptions 


Floating Point Instructions 


62 


SUBD2 sub.rd, dif.md 




* 


0 


0 


rsv, fov, fuv 


42 


SUBF2 sub.rf, dif.mf 


* 


* 


0 


0 


rsv, fov, fuv 


42FD 


SUBG2 sub.rg, dif.mg 


* 


* 


0 


0 


rsv, fov, fuv 


63 


SUBD3 sub.rd, min.rd, dif.wd 


* 


* 


0 


0 


rsv, fov, fuv 


43 , 


SUBF3 sub.rf, min.rf, dif.wf 


* 


* 


0 


0 


rsv, fov, fuv 


43FD 


SUBG3 sub.rg, min.rg, dif.wg 


* 


* 


0 


0 


rsv, fov, fuv 


73 


TSTD src.rd 


* 


* 


0 


0 


rsv 


53 


TSTF src.rf 


* 


* 


0 


0 


rsv 


53FD 


TSTG srcrg 


* 


* 


0 


0 


rsv 


Microcode- Assisted Emulated Instructions 


20 


ADDP4 addlen.rw, addaddr.ab, sumlen.rw, 
sumaddr.ab 


* 


* 


* 


0 


rsv, dov 


21 


ADDP6 addllen.rw, addladdr.ab, add21en.rw, 
add2addr.ab, sumlerurw, sumaddr.ab 


* 


* 


* 


0 


rsv, dov 


F8 


ASHP cnt.rb, srden.rw, srcaddr.ab, round.rb, 
dstlen.rw, dstaddr.ab 


* 


* 


* 


0 


rsv, dov 


35 


CMPP3 len.rw, srcladdr.ab, src2addr.ab 


* 


* 


0 


0 




37 


CMPP4 srcllen.rw, srcladdr.ab, src21en.rw, 
src2addr.ab 


* 


* 


0 


0 




OB 


CRC tbl.ab, inicrcrl, strlen.rw, stream.ab 


* 


* 


0 


0 




F9 


CVTLP srcrl, dstlen.rw, dstaddr.ab 


* 


* 


* 


0 


rsv, dov 


36 


CVTPL srclen.rw, srcaddr.ab, dst-wl 


* 


* 


* 


0 


rsv, iov 


08 


CVTPS srclen.rw, srcaddr.ab, dstierurw, 
dstaddr.ab 


* 


* 


* 


0 


rsv, dov 


09 


CVTSP srcleiurw, srcaddr.ab, dstlen.rw, 


* 


* 


* 


0 


rsv, dov 



dstaddr.ab 
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Table 2-6 (Cont.): NVAX Instruction Set 



Opcode 


Instruction 


N 


z 


V 


c 


Exceptions 


Microcode-Assisted Emulated Instructions 


24 


CVTPT srclen,rw, srcaddr.ab, tbladdr.ab, 
dstlerurw, dstaddr.ab 


* 




* 


0 


rsv, dov 


26 


CVTTP srclen.rw, srcaddr.ab, tbladdr.ab, 
dstlen.rw, dstaddr.ab 


* 


* 


* 


0 


rsv, dov 


27 


DIVP divrlen.rw, divraddr.ab, divdlen.rw, 
divdaddr.ab, quolen.rw, quoaddr.ab 


* 


* 


* 


0 


rsv, dov, ddvz 


38 


EDITPC srclen-rw, srcaddr.ab, pattern.ab, 
dstaddr.ab 


* 


* 


* 




rsv, dov 


39 


MA.TCHC objlen.rw, objaddrab, srclen.rw, 
srcaddr.ab 


0 


* 


0 


0 




34 


MOVP len.rw, srcaddr.ab, dstaddr.ab 


* 


* 


0 


0 




2E 


MOVTC srclen.rw, srcaddr-ab, fill.rb, 
tbladdr.ab, dstl6n.rw, dstaddr.ab 


* 


* 


0 


* 




2F 


MOVTUC srclen.rw, srcaddr.ab, esc.rb, 
tbladdr.ab, dstlen.rw, dstaddr.ab 


* 


* 


* 


* 




25 


MULP mulrlerurw, mulraddr.ab, muldlen.rw, 
muldaddr.ab, prodlen.rw, prodaddr.ab 


* 


* 


* 


0 


rsv, dov 


22 


SUBP4 sublen.rw, subaddr.ab, diflen.rw, 
difaddr.ab 


* 


* 


* 


0 


rsv, dov 


23 


SUBP6 sublen.rw, subaddr.ab, minlen.rw, 

TniTiaddr.ab. difl«Ti.rw. difaddr.nh 


* 


* 


* 


0 


rsv, dov 
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The notation used for operand specifiers is <name>.<access typexdata type>. Implied operands (those locations that are 
referenced by the instruction but not specified by an operand) are denoted by curly braces {}. 

Access Type 

a = address operand 

b = branch displacement 

m = modified operand (both read and written) 

r = read only operand 

v = if not "Rn", same as a, otherwise R[n+l]H[n] 
w = write only operand 

Data Type 

b = byte 
d = D.fioating 
f = F_fioating 
g = G_fioating 
1 = longword 
q = quadword 

v = field (used only in implied operands) 
w = word 

* = multiple longwords (used only in implied operands) 
Condition Codes Modification 

* = conditionally set/cleared 
— = not affected 

0 = cleared 

1 = set 

Exceptions 

rsv = reserved operand fault 
iov = integer overflow trap 
idvz s integer divide by zero trap 
fov = floating overflow fault 
fuv = floating underflow fault 
fdvz s floating divide by zero fault 
dov = decimal overflow trap 
ddvz = decimal divide by zero trap 
sub = subscript range trap 
prv = privileged instruction fault 
vec = vector unit disabled fault 
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2.6 Memory Management 

The NVAX CPU Chip supports a four gigabyte (2**32) virtual address space, divided into two 
sections, system space and process space. Process space is further subdivided into the PO region 
and the PI region. 



2.6.1 Memory Management Control Registers 

Memory management is controlled by three processor registers: Memory Management Enable 
(MAPEN), Translation Buffer Invalidate Single (TBIS), and Translation Buffer Invalidate All 
(TBIA). 

Bit <0> of the MAPEN register enables memory management if written with a 1 and disables 
memory management if written with a 0. The MAPEN register is shown in Figure 2-11. 



Figure 2-11 : IPR 38 (hex), MAPEN 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12 | 11 10 09 08 | 07 06 05 04 | 03 02 01 00 
(. — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +-.-+ — + 

100000000000000000000000000000001 I : MAPEN 
(- — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I 

MME — + 



The TBIS register controls translation buffer invalidation. Writing a virtual address into TBIS 
invalidates any entry which maps that virtual address. The TBIS format is shown in Figure 2—12. 



Figure 2-12: IPR 3A (hex), TBIS 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12111 10 09 08 | 07 06 05 04 | 03 02 01 00 
(. — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I Virtual Address I :TBIS 



The TBIA register also controls translation buffer invalidation. Writing a zero into TBIA 
invalidates the entire translation buffer. The TBIA format is shown in Figure 2-13. 
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Figure 2-13: IPR 39 (hex), TBIA 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — +. — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

1000000000000000000000000000000001 : TBIA 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



2.6.2 System Space Address Translation 

A virtual address with bit <31> = 1 is an address in the system virtual address space. 

System virtual address space is mapped by the System Page Table (SPT), which is defined by 
the System Base Register (SBR) and the System Length Register (SLR). The SBR contains the 
page-aligned physical address of the the System Page Table. The SLR contains the size of the 
SPT in longwords, that is, the number of Page Table Entries. The Page Table Entry addressed 
by the System Base Register maps the first page of system virtual address space, that is, virtual 
byte address 80000000 (hex). These registers are shown in Figure 2-14. 

With a 22-bit SLR width, 2 22 - 1 pages in system space may be addressed. As a result, the last 
page of system space (beginning at virtual address FFFFFE00 (hex)) is not addressable. As a 
result, this page is reserved and a reference to any address in that page will result in a length 
violation. 

NOTE 

NVAX CPU chips at revision 1 implement the original VAX memory management 
architecture in which any reference to a virtual address above BFFFFFFF (hex) causes 
a length violation. NVAX CPU chips at revision 2 or later implement the extended SO 
space addressing described above. 

NOTE 

When the CPU is configured to generate 30-bit physical addresses, SBR<31:30> are 
ignored. 
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Figure 2-14: IPR OC (hex), SBR and IPR OD (hex), SLR 



31 30 29 28|27 26 25 24123 22 21 20|19 18 17 16115 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 

+ + + + + + + + + + + + +--+ + +— + + H + + +-- + + + + + + +--+ +— + + 

| Physical Page Address of SPT 100000000 0| :SBR 

+ — + — + — + — + — + — + — + — + — + — +-- + — +--+ — +--+--+ — +--+ — + — + — +--+ — + — + — + — + — + — + — + — + — + — + 

31 30 29 28|27 26 25 24123 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — +— + — + — + — + — + — + — +— + — + — + — +— + — + — + — + — + — + — +--+ — + — + — + 
100000 000001 Length of SPT in Longwords I : SLR 

+ + + +— + + +— + + +. + + +— + + + — + +— + + H +-- + + + + + + + + +— + + 



The system space translation algorithm is shown graphically in Figure 2-15. 
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Figure 2-15: System Space Translation Algorithm 



system-space 
virtual address: 



SBR: 



3 3 

1 0 9 8 0 

+-+ + + 

1 1 1 virtual page number | byte | 

+-+ + + 

I l\ \ 

I extract VPN, | \ \ 



2|2 
413 



check length, | \ 
and add I \ 

211 0 \ 

+ + + \ 

physical address of SPT base | \ 



I sign-extend PA<29> to PA<31:30>| 
I if in 30-bit mode | 

I I 
1 3 yields | 

II 0| 

+ + 

I physical address of SPTE | 



fetch 



2 2 

3 2 



0 

— + 



page frame number 





+ | 




I check access in current 


1 1 




I mode, 


1 1 




I sign-extend PTE<20> to 


1 1 




I PTE<22:21> if in 30-bit 


1 1 




I mode 


1 1 




I merge 


i / 


/ 


13 


1 / 


/ 


11 9 


1/8 


0 / 


+ 




+ 



\ 



physical address: 



page frame number | byte 



2.6.3 Process Space Address Translation 

A virtual address with bit <31> = 0 is an address in the process virtual address space. Process 
space is divided into two equal sized, separately mapped regions. If virtual address bit <30> = 0, 
the address is in region P0. If virtual address bit <30> = 1, the address is in region PI. 



2.6.3.1 P0 Region Address Translation 

The P0 region of the address space is mapped by the P0 Page Table (POPT), which is denned by 
the P0 Base Register (POBR) and the P0 Length Register (POLR). The POBR contains the system 
page-aligned virtual address of the P0 Page Table. The POLR contains the size of the POPT in 
longwords, that is, the number of Page Table Entries. The Page Table Entry addressed by the P0 
Base Register maps the first page of the P0 region of the virtual address space, that is, virtual 
byte address 0. The P0 base and length registers are shown in Figure 2-16. 
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The PO space translation algorithm is shown graphically in Figure 2-17. 
Figure 2-16: IPR 08 (hex), POBR and IPR 09 (hex), POLR 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08 ( 07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + h — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 1- — + — + — + 

|1 0| System Virtual Page Address of POPT |000000000|: POBR 

+ + + + + + + — + + + + — + + + + + + + — + + + + H H + — + + h + — + + + + 

31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

100000000001 Length of POPT in Longwords | :P0LR 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



Figure 2-17: P0 Space Translation Algorithm 



3 3 2 
10 9 



proces s- space 
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I 0 | virtual page number 



POBR: 
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2 |2 
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+ + + 
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13 I / 

11 9 



\ 



/ 



physical address: 



+ + 

I 

+- 



page frame number 



/8 0 / 
I byte | 



2.6.3.2 P1 Region Address Translation 

The PI region of the address space is mapped by the PI Page Table (P1PT), which is denned by the 
PI Base Register (P1BR) and the PI Length Register (P1LR). Because PI space grows towards 
smaller addresses, and because a consistent hardware interpretation of the base and length 
registers is desirable, P1BR and P1LR describe the portion of PI space that is NOT accessible. 
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Note that P1LR contains the number of nonexistent PTEs. P1BR contains the page-aligned 
virtual address of what would be the PTE for the first page of PI, that is, virtual byte address 
40000000 (hex). The address in P1BR is not necessarily an address in system space,but all the 
addresses of PTEs must be in system space. 

The PI space translation algorithm is shown graphically in Figure 2—19. 
Figure 2-18: IPR OA (hex), P1BR and IPR 0B (hex), P1 LR 



31 30 29 28127 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — .( + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — -i + — + — + — + — + — + — + — + — + — + — + — + 

| Virtual Page Address of PIPT |000000000|: P1BR 

+• + + + H + — + + + + + + + + + + + + + + + + + + + + + + + + + — + + 

31 30 29 28127 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08 | 07 06 05 04 | 03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| 0 0 0 0 0 0 0 0 0 0| (2 ** 21) - Length of PIPT in Longwords | :P1LR 

+ + — + + H + — + + h + + + + + + + + + — + + + + — + + — + + + + + + + — + + 



Figure 2-19: P1 Space Translation Algorithm 
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2.6.4 Page Table Entry 

If the CPU is configured to generate 30-bit physical addresses, it interprets PTEs in the 21-bit 
PFN format shown in Figure 2-20. Conversely, if the CPU is configured to generate 32-bit 
physical addresses, it interprets PTEs in the 25-bit PFN format shown in Figure 2-21. Note that 
bits <24:23> of the 25-bit PFN format are ignored by the NW&X CPU chip, which implements only 
32-bit physical addresses. The PTE formats shown below are described both in DEC Standard 
032, and in Chapter 12. 

Figure 2-20: PTE Format (21-blt PFN) 



21 30 2S 28 127 26 25 24123 22 21 20119 18 17 16115 14 13 12 111 10 OS 08|07 06 05 04 | 03 02 01 00 

■i H r + h 1— + -I + H 1- K h h + + h + J A K A + h— r +— i i + +—4—-+--+ 

! V! PROT I Ml 21 OWN | S| S| Page Frame Number I :PTE 



Figure 2-21 : PTE Format (25-blt PFN) 



31 3C 25 28i27 26 25 24|23 22 21 20 US 18 17 16115 14 13 12111 10 OS 06107 06 05 04103 02 CI 00 
v; r?w07 i Ki Si SI S! Page Frame Number ! 
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Table 2-7: PTE Protection Code Access Matrix 




Code 






Current Mode 






Decimal Binary 


Mnemonic 


K 


E 


S 


U 


Comment 




v\A/w 


NA 










no access 


1 
X 


UUUl 






unpredictable 




reserved 


o 


nnm 

UUIU 


KW 


RW 










o 


vUll 


KR 


R 










A 


m no 


UW 


RW 


RW 


RW 


RW 


all access 


C 
O 


mm 


EW 


RW 


RW 








6 


0110 


ERKW 


RW 


R 








7 


0111 


ER 


R 


R 








8 


1000 


SW 


RW 


RW 


RW 


- 




9 


1001 


SREW 


RW 


RW 


R 






10 


1010 


SRKW 


RW 


R 


R 






11 


1011 


SR 


R 


R 


R 






12 


1100 


URSW 


RW 


RW 


RW 


R 




13 


1101 


UREW 


RW 


RW 


R 


R 




14 


1110 


URKW 


RW 


R 


R 


R 




15 


1111 


UR 


R 


R 


R 


R 





Access Modes 

K = Kernel 
E - Executive 
S = Supervisor 
U = User 



Access Types 

R = Read 
W = Write 
— = No access 



2.6.5 Translation Buffer 

In order to save actual memory references when repeatedly referencing pages, the NVAX CPU 
Chip uses a translation buffer to remember successful virtual address translations and page 
status. The translation buffer contains 96 fully associative entries. Both system and process 
references share these entries. 

Translation buffer entries are replaced using a not-last-used (NLU) algorithm. This algorithm 
guarantees that the replacement pointer is not pointing at the last translation buffer entry to be 
used. This is accomplished by rotating the replacement pointer to the next sequential translation 
buffer entry if it is pointing to an entry that has just been accessed. Both D-stream and I-stream 
references can cause the NLU to cycle. When the translation buffer does not contain a reference's 
virtual address and page status, the machine updates the translation buffer by replacing the 
entry that is selected by the replacement pointer. 
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2.7 Exceptions and Interrupts 

At certain times during the operation of a system, events -within the system require the execution 
of software routines outside the explicit flow of control of instruction execution. An exception is 
an event that is relevant primarily to the currently executing process and normally invokes a 
software routine in the context of the current process. An interrupt is an event which is usually 
due to some activity outside the current process and invokes a software routine outside the context 
of the current process. 

Exceptions and interrupts are reported by constructing a frame on the stack and then dispatching 
to the service routine through an event-specific vector in the System Control Block (SCB). The 
minimum stack frame for any interrupt or exception is a PC/PSL pair as shown in Figure 2-22. 

Figure 2-22: Minimum Exception Stack Frame 



31 30 2& 2e|27 26 25 24122 22 21 20|1S IS 1" 16,15 14 13 12 i 11 10 09 08 I 07 06 05 04103 02 01 00 

PC ! : (SP) 



This m-imrmrm stack frame is used for all interrupts. Certain exceptions expand the stack frame 
by pushing additional parameters on the stack above the PC/PSL pair as shown in Figure 2—23. 

Figure 2-23: General Exception Stack Frame 



31 30 2S 28127 26 25 24|23 22 21 20|1S 18 17 16|15 14 13 12 111 10 0& 08107 06 05 04103 02 01 00 



H + H + H + + + 4- + + -I + + + + + H +• + + (- + + + + + H + + + 1 + 

I Parameter n I : (SP) 

+ + 4 + + + + + + + + + + H + + + + + + + 1- + + + H + -I + 4 + H + 

+ + H + + + + + + H + + + H + + H A + + H K H + H +~+ H + H + 1 + 

I Parameter 1 I 

+__+__+»_+—+—.+—+__+—+— +—+_-^ — +__+__+—+__+__+—+ 

I PC I 

+ — + — + — + — H + — + — H + H + — +~+ + + — + — H + — * + — +-- + — H + H + + — +— + — H + — + + 

I PSL I 

+ + + + H + + H + + + H + + + + H + + + A + H + H H + H + H + -I + 



What parameters, if any, are pushed on the stack above the PC/PSL pair is a function of the 
specific exception being reported. 

2.7.1 Interrupts 

DEC Standard 032 defines 31 interrupt priority levels, a subset of which is implemented by 
the NVAX CPU. When an interrupt request is generated, the hardware compares the request 
with the current IPL of the CPU. If the new request is of higher priority an internal request is 
generated. At the completion of the current instruction (or at selected points during the execution 
of interruptible instructions), a microcode interrupt handler is invoked to process the request. 
With hardware assistance, the microcode handler determines the highest priority interrupt, 
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updates the IPL, pushes a PC/PSL pair on the stack, and dispatches to a macrocode interrupt 
handler through the appropriate location in the SCB. 

Of the 31 interrupt priority levels denned by DEC Standard 032, the NVAX CPU makes use of 
24 of them, as shown in Table 2-8. 



Table 2-8; interrupt Priority Levels 



IPL (hex) 


IPL (decimal) 


Interrupt Condition 


IF 


31 


HALT_L asserted (non maskable) 


IE 


30 


PWKFLJL asserted 


ID 


29 


H_EKR_L asserted (or internal hard error detected) 


1C 


28 


Unused 


IB 


27 


Performance monitoring interrupt (internally handled by 
microcode) 


1A 


26 


S_EKR_L asserted (or internal soft error detected) 


18-19 


24-25 


Unused 


17 


23 


IRQ_L<3> asserted 


16 


22 


IRQ_L<2> or INT_T1M_L asserted (ERQ_L<2> takes priority) 


15 


21 


IRQ_L<1> asserted 


14 


20 


IRQJL<0> asserted 


10-13 


16-19 


Unused 


01-0F 


01-15 


Software interrupt asserted 



Interrupts are discussed in more detail in Chapter 10. 

2.7.1.1 interrupt Control Registers 

The interrupt system is controlled by three processor registers: the Interrupt Priority Level 
Register (IPL), the Software Interrupt Request Register (SIRR), and the Software Interrupt 
Summary Register (SISR). 

A new interrupt priority level may be loaded into PSL<20:16> by writing the new value to 
IPL<4:0>. The IPL register is shown in Figure 2-24. 
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Figure 2-24: IPR 12 (hex), IPL 

31 30 29 28127 26 25 24123 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + h — + — + — + — + — + — + + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

|000000000000000000000000000| PSL<20:16> | :IPL 
+ — + — + — + — + — -i + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

A software interrupt may be requested by writing the desired level to SIRR<3:0>. The SIRR 
register is shown in Figure 2-25. 

Figure 2-25: IPR 14 (hex), SIRR 

31 30 29 28127 26 25 24123 22 21 20|19 18 17 16 | 15 14 13 12111 10 09 08 | 07 06 05 04 | 03 02 01 00 
+ + + — + + + + — h + + — + + + + + + + + + — + + — + — + — + + — + + — + + + + + + 

1000000000000000000000000000 0|Request IPL I :SIRR 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

The SISR register records pending software interrupt requests at levels 01 through OF (hex). The 
SISR register is shown in Figure 2—26. 

Figure 2-26: IPR 15 (hex), SISR 

31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16115 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

100000000000000001 I I I I I I i I I I I I I I |0| :SISR 

+ — +— +— + — + — + — +--+ — + — +--+ — + — + — +--+ — + — + — + — + — + — + — + — + — + — +— + — +— + — + — + — + — + — + 

II II 

IPL 15 request — + | ... IPL 2 request — + | 

IPL 14 request — + IPL 1 request — + 



2.7.2 Exceptions 

The VAX architecture recognizes six classes of exceptions. Table 2-9 lists instances of exceptions 
in each class. 



Table 2-9: Exception Classes 



Exception Class 



Instances 



Arithmetic traps/faults 



Integer overflow trap 
Integer divide-by-zero trap 
Subscript range trap 
Floating overflow fault 
Floating divide-by-zero fault 
Floating underflow fault 
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Table 2-9 (Cont.): Exception Classes 



Exception Class 


Instances 


Memory management exceptions 


Access control violation fault 




Translation not valid fault 




M=0 fault 


v^peranu reicrcnce cJk.cepu.ono 


Reserved, addressing mode fault 




Reserved operand fault or abort 


Instruction execution exceptions 


Reserved/privileged instruction fault 




Emulated instruction faults. 




XFC fault 




Change-mode trap 




Breakpoint fault 




Vector disabled fault 


Tracing exceptions 


Trace fault 


System failure exceptions 


Kernel-stack-not-valid abort 




Interrupt-stack-not-valid halt 




Console error halt 




Machine check abort 



A trap is an exception that occurs at the end of the instruction that caused the exception. 
Therefore, the PC saved on the stack is the address of the next instruction that would normally 
have been executed. 

A fault is an exception that occurs during an instruction and that leaves the registers and memory 
in a consistent state such that elimination of the fault condition and restarting the instruction 
will give correct results. After the instruction faults, the PC saved on the stack points to the 
instruction that faulted. 

An abort is an exception that occurs during an instruction. An abort leaves the value of registers 
and memory UNPREDICTABLE such that the instruction cannot necessarily be correctly 
restarted, completed, simulated, or undone. In most instances, the NVAX microcode attempts to 
convert an abort into a fault by restoring the state that was present at the start of the instruction 
which caused the abort. 

The following sections describe only those exceptions which are unique to the NVAX CPU, or 
where DEC Standard 032 is not clear about the implementation. 

2.7.2.1 Arithmetic Exceptions 

Arithmetic exceptions are detected during the execution of instructions that perform integer or 
floating point arithmetic manipulations. Whether the exception is reported as a trap or a fault 
is a function of the specific event. In any case, the exception is reported through SCB vector 34 
(hex) with the stack frame shown in Figure 2—27. Table 2—10 lists the exceptions reported by 
this mechanism. 
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Arithmetic Exception Stack Frame 



31 30 29 28127 26 25 24123 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04 | 03 02 01 00 

+ + — + + + + + — + + + + — + + — + + + (.__+ + + + + — -| + — +— + + — + + + + + + 

I Type Code I : <SP) 

+ + — + + — + + + — +— + + +--+ +— + + + — +— + + + — + + — + + H + +--+ + K + +- -+ 

I PC I 
+ + + + + + + + + + — + — + + — + + + — + + + + — + + — + + + + + — + + + + + + 

I PSL I 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



Tabie 2-10: Arithmetic Exceptions 

Type Code 

Decimal Hex Type Exception 

1 1 Trap Integer overflow 

2 2 Trap Integer divide-by-zero 

7 7 Trap Subscript range 

8 8 Fault Floating overflow 

9 9 Fault Floating divide-by-zero 

10 A Fault Floating underflow 



2.7.2.2 Memory Management Exceptions 

Memory management exceptions are detected during a memory reference and are always reported 
as faults. The three memory management exceptions are listed in Table 2-11. All three exceptions 
push the same frame on the stack, as shown in Figure 2-28. The top longword of the stack frame 
contains a fault parameter whose bits are described in Table 2-12. 



Table 2-11: 


Memory Management Exceptions 


SCB Vector 


Exception 


20 (hex) 


Access control violation 


24 (hex) 


Translation not valid 


3C (hex) 


Modify fault 
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Figure 2-28: Memory Management Exception Stack Frame 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08 | 07 06 05 04|03 02 01 00 



00000000000000000 


0 0 0 0 


0 0 0 0 0 0 0 0| M| PI L 


Some Virtual Address in 


the Faulting 


Page 


PC 




1 


PSL 




1 



Table 2-12: Memory Management Exception Fault Parameter 


Bit Mnemonic 


Meaning 


0 L 


Length violation 


1 P 


PTE reference 


2 M 


Modify or write intent 



2.7.2.3 Emulated Instruction Exceptions 

The NVAX CPU implements the VAX base instruction group. For certain instructions outside 
that group, the NVAX microcode provides support for the macrocode emulation of instructions. 
There are two types of emulation exceptions, depending on whether PSL<FPD> is set at the 
beginning of the instruction. 

If PSL<FPD>=0 at the beginning of the instruction, the exception is reported through SCB vector 
C8 (hex) as a trap with the stack frame shown in Figure 2-29. The longwords in the stack frame 
are described in Table 2-13. 
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Figure 2-29: Instruction Emulation Trap Stack Frame 



31 30 26 28127 26 25 24|23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08107 06 05 04 | 03 02 01 00 

+ y + + 1 + + 1 + +--+ + +-- -+ +—+—+—+—+ 1~- + +--4 

I Opcode I : <SP) 

+ + + + y + + + + +, + (. + H + + + + H + + + + + + + + + + + + + + 

! Old PC I 

h + y +. Jr—i + y—+ +_.+—+__+__+— H +—i + +—+—+—-+,—+ + j + y y + 

I Specifier #1 I 

+ -I + + + +— + n +— f h + + + +— + + t— ■+ +--+ +--+ + + + +— + + 

I Specifier #2 I 

j 1 1 1 1 j 1 j. 1 — — -| 1 y- — y 1 1 y 1 n 1 1 1 1 -| + H 1 * H— I ! (~ — I ^ 

I Specifier #3 I 

j. h V -r ! + h H + H + (- 1 1 <r + + + H 1— + V -I + H H + + 1 H 1- >■ V 

I Specifier #4 I 

+ + y y 1 y—+ 1 + H + +- — i h r -r H 4- (■ +— + H + + H h— +— I + h — H h 

! Specifier #5 I 

-I h +«+ r— r + + +--H !-—+—+ + Y + + + H + + + y + + 4 h + H 1 r i h 

I Specifier #6 I 

» + y +--4- + ! 1 + H + 1- 1- h V + *r 1 -r— I !■ +--+--+ H +— + + *—>. +--+ 

I Specifier #7 | 

] Specifier #8 I 

— — — — — _ 4— — — J— — — - — — — — — — — — — — 4-— — — — — ^— — — — — — — — — — — — 4— — — J.— J.— — 4-—— J.—— —————— J. 

! PC ! 



Table 2-13: instruction Emulation Trap Stack Frame 

Location Use 



Opcode Zero-extended opcode of the emulated instruction 

Old PC PC of the opcode of the emulated instruction 

Specifiers Address of the specified operand for specifiers of access type write (.wx) or address 

(.ax). Operand value for specifiers of access type read (.rx). For read-type operands 
whose size is smaller than a longword, the remaining bits are UNPREDICTABLE. 
For those instructions that don't have 8 specifiers, the remaining specifier longwords 
contain UNPREDICTABLE values 

New PC PC of the instruction following the emulated instruction 

PSL PSL saved at the time of the trap 



If PSL<FPD>=1 at the beginning of the instruction, the exception is reported through SCB vector 
CC (hex) as a fault with the stack frame shown in Figure 2-30. In this case, PC is that of the 
opcode of the emulated instruction. 
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Figure 2-30: Suspended Emulation Fault Stack Frame 



31 30 29 28127 26 25 24|23 22 21 20(19 16 17 16115 14 13 12111 10 09 08107 06 05 04 | 03 02 01 00 

(. + + + -| + + -| + 4 + + + H + + + + + + H + H + + H + + + + + H A 

I PC I 
>—+—+--+- — I +—+—+--+ ¥ (.—+—.+— +--+—+- — |- 4 + + + _. + __ + __ + «_ + __ H 

I PSL I 



(SP) 



2.7.2.4 Vector Unit Disabled Fault 

When the NVAX CPU attempts to issue a vector instruction to the optional vector processor, it 
may discover that the vector unit is disabled. In this case, a vector unit disabled fault is initiated 
through SCB vector 68 (hex). There are no parameters for this exception (besides the usual 
PC/PSL pair), and the reason for the exception must be determined by reading the appropriate 
vector unit registers. 



2.7.2.5 Machine Check Exceptions 

A machine check exception is reported through SCB vector 04 (hex) when the NVAX CPU detects 
an error condition. The frame pushed on the stack for a machine check indicates the type of error 
and provides internal state information that may help identify the cause of the error. The generic 
machine check stack frame is shown in Figure 2-31. Machine checks are discussed at length in 
Chapter 15. 



Figure 2-31 : Generic Machine Check Stack Frame 



31 30 29 28127 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08107 06 05 04 | 03 02 01 00 

+ + 4- + + + H T J. -| + *~ + -i + H ■*—-+■ + + H i- + + + + + -I + + + 1 + 

I Byte Count of Parameters, Excluding This longword I : (SP) 

■i — ■>■ — * — +-- ■+ — + — +— + — +--+--+ — +~+ — +--+—+— +--+--+—+—+—+--+--+— +--+--+--+ +—"+--+--+ — v 

H + + + + + H + + 4 + + + + + + -I + J + + h 4 + H + + H + + + h + 

I PC I 

+ + h + -| + + 4 + -| + H +— H + + -I +— H + + h + + H + + + + 4 + + + 

I PSL I 



2.7.2.6 Console Halts 

In certain microcode flows, the NVAX microcode may detect an inconsistency in internal state, 
a kernel-mode HALT, or a system reset. In these instances, the microcode initiates a hardware 
restart sequence which passes control to the console program. 

When a hardware restart sequence is initiated, the NVAX microcode saves the current CPU 
state, partially initializes the CPU, and passes control to the console program at physical address 
E0040000 (hex). 

During a hardware restart sequence, the stack pointer is saved in the appropriate stack pointer 
IPR (0 through 4), the current PC is saved in IPR 42 (SAVPC), and the current PSL, halt code, 
and validity flag are saved in IPR 43 (SAVPSL). The format of SAVPC and SAVPSL are shown 
in Figure 2-32. 
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Figure 2-32: IPR 2A (hex), SAVPC and IPR 2B (hex), SAVPSL 



31 30 29 28|27 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04 1 03 02 01 00 





Saved PC 






1 


: SAVPC 


31 30 29 28|27 26 25 24|23 22 21 20|19 


18 17 16|15 14 13 12111 


10 09 08107 06 


05 04|03 02 


01 00 




PSL<31:16> 


1 1 1 Halt 


Code | 


PSK7 : 0> 


1 


: SAVPSL 



MAPEN<0> — + | 
Invalid SAVPSL if 1 --+ 



Console halts are discusssed in detail in Chapter 15. 



2.8 System Control Block 

The System Control Block (SCB) is a page containing the vectors for servicing interrupts and 
exceptions. The SCB is pointed to by the System Control Block Base Register (SCBB), whose 
format is shown in Figure 2—33. For best performance, SCBB should contain a page-aligned 
address. Microcode forces a longword-aligned SCBB by clearing bits <1:0> of the new value 
before loading the register. 

NOTE 

When the CPU is configured to generate 30-bit physical addresses, SCBB<31:30> are 
ignored. 

Figure 2-33: IPR 11 (hex), SCBB 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08107 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +• — + — + — + — + — + — + — + — + — + + — H 

I Physical Page Address of SCB I SBZ | 0 0 | 



2.8.1 System Control Block Vectors 

An SCB vector is an aligned longword in the SCB through which the NVAX microcode dispatches 
interrupts and exceptions. Each SCB vector has the format shown in Figure 2-34. The fields of 
the vector are described in Table 2-14. 
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Figure 2-34: System Control Block Vector 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 



+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I longword address of service routine | code | 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



Table 2-14: System Control Block Vector 

Bits Contents 

31:2 Virtual address of the service routine for the interrupt or exception. The routine must be 

longword aligned, as the microcode forces the lower two bits of the address to 00 

IK) Code, interpreted as follows: 



Value Meaning 

00 The event is to be serviced on the kernel stack unless the CPU is already on the 
interrupt stack, in which case the event is serviced on the interrupt stack 

01 The event is to be serviced on the interrupt stack. If the event is an exception, the 
IPL is raised to IF (hex) 

10 Unimplemented, results in a console error halt 

11 Unimplemented, results in a console error halt 



2.8.2 System Control Block Layout 

The System Control Block layout is shown in Table 2-15. 



Table 2-15: System Control Block Layout 



Vector 



Name 



Type 



Par am Notes 



00 
04 

08 

0C 
10 

14 
18 
1C 
20 



passive release interrupt 
machine check abort 



kernel stack not valid abort 

power fail interrupt 

reserved/privileged fault 
instruction 

customer reserved instruction fault 

reserved operand fault/abort 

reserved addressing mode fault 

access control violation/vector fault 
alignment fault 



IPL is raised to request IPL 

parameters reflect 
machine state; must be serviced 
on interrupt stack 

must be serviced on interrupt 
stack 

IPL is raised to IE (hex) 



XFC instruction 

not always recoverable 



parameters are virtual address, 
status code 



2-42 Architectural Summary 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



Table 2-15 (Cont.): System Control Block Layout 



Vector 



Name 



Type 



Parana Notes 



24 

28 
2C 
30 

34 

38-3C 
40 

44 

48 

4C 

50 
54 
58 



5C 
60 
64 
68 

6C-7C 

80 

84 

88 

8C 

90-BC 

CO 

C4 



translation not valid fault 

trace pending fault 

breakpoint instruction fault 

unused - 

arithmetic trap/fault trap/fault 

unused - 

CHMK trap 

CHME trap 

CHMS trap 

CHMU trap 

unused - 

soft error notification interrupt 

Performance monitoring interrupt 
counter overflow 



unused 

hard error notification 
unused 

vector unit disabled 
unused 

interprocessor interrupt 
software level 1 
software level 2 
software level 3 

software levels 4-15 
interval timer 
unused 



interrupt 

fault 

interrupt 
interrupt 
interrupt 
interrupt 

interrupt 
interrupt 



parameters are virtual address, 
status code 



compatibility mode in other 
VAXes 

parameter is type code 



parameter is sign-extended 
operand word 

parameter is sign-extended 
operand word 

parameter is sign-extended 
operand word 

parameter is sign-extended 
operand word 

IPL is 1A (hex) 

Internal interrupt at IPL IB 
(hex). This vector supplies 
the physical base address of 
the block of performance 
monitoring counts in memory. 
See Chapter 18 for details. 

IPL is ID (hex) 

vector instructions 

IPL is 16 (hex) 

ordinarily used for AST delivery 

ordinarily used for process 
scheduling 

IPL is 16 (hex) 
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Table 2-15 (Cont): System Control Block Layout 


Vector 


Name 


Type 


Par am 


Notes 


C8 


emulation start 


fault 


10 


same mode exception, 

H* MI )— 0* t*>q TYi At^fg 5i rvrww?A 

PC, specifiers 


CC 


emulation continue 


fault 


0 


same mode exception, FPD=1; no 
parameters 


D0-F4 


unused 








F8 


console receiver 


interrupt 


0 


IPL is 15 (hex) 


FC 


console transmitter 


interrupt 


0 


IPL is 15 (hex) 


100-FFFC 


device vectors 


interrupt 


0 


Device interrupt vectors 



2.9 CPU Identification 



Software may quickly determine on which CPU it is executing in a multi-processor system by 
reading the CPUID processor register. The format of this register is shown in Figure 2-35. 



Figure 2-35: IPR OE (hex), CPUID 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12111 10 09 08107 06 05 04|03 02 01 00 

+ + — + + + + + + + — + + + + + + — + — + + — + + + + + + — + — H + + + — + + + + 

10000000000000000000000001 CPU Identification I :CPUID 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



The CPUID processor register is implemented internally as an 8-bit read-write register. The 
source of the CPU ID information is system-specific, and it is the responsibility of the console 
firmware at powerup to determine the CPU ID from the system-specific source, and write the 
CPU ID register to the correct value. 



2.10 System Identification 

The System Identification Register (SID) is a read-only register which includes the the system 
(actually the CPU) type, and the microcode revision number. The format of the SID register is 
shown in Figure 2-36. 
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Figure 2-36: I PR 3E (hex), SID 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 

+ — + — + — + — H H + — + + — H + — + — + — + + — + — + — + — + + — + — + — + — + — H + + — + — + — + — + — + + 

I CPU Type 1000000000 OlPatch Revision |NS| Microcode Revision | :SID 

+ — + h + +--+ — + — H + — H + — + — + — + + — + — + — + — + + — +• — + — + — + — + — + + — + — + — +~+ — + + 
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Table 2-16: SID Field Descriptions 



Name 


Extent 


Type 

•*V IT 


Description 


Microcode Revision 


7:0 


RO 


This field contains the microcode (chip) revision number. 
This number is incremented for each pass of the chip. 


NS 


8 


RO,0 


If this bit is a zero, there is either no microcode patch 
loaded, or the patch is a standard patch. If this bit 
is a one, a non-standard microcode patch is loaded. A 
non-standard patch is one which goes beyond the formally 

ST D J *•/ 

released patches, such as a patch used for performance 
analysis. This bit is cleared on chip reset. 


Patch Revision 


13:9 


RO,0 


If this field is zero, no microcode patch is loaded. If this 
field is non-zero, a microcode patch is loaded and this field 
indicates the patch number. This field is cleared on chip 
reset. 


CPU Type 


31:24 


RO 


This field contains 19 (decimal), indicating that this is an 
NVAX CPU. 



NOTE 

The patch revision and non-standard patch fields (SID<13:8>) were added in pass 2 of 
the NVAX chip. 



2.11 Process Structure 

A process is a single thread of execution. The context of the current process is contained in the 
Process Control Block (PCB). The PCB is pointed to by the Process Control Block Base register 
(PCBB), which is shown in Figure 2-37. The format of the process control block is shown in 
Figure 2-38. Microcode forces a longword-aligned PCBB by clearing bits <1:0> of the new value 
before loading the register. 

NOTE 

When the CPU is configured to generate 30-bit physical addresses, PCBB<31:30> are 
ignored. 
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Figure 2-37: IPR 10 (hex), PCBB 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — h +— + H + — H + — + + — H + — + — + — + — + — + — + — + — H + — + — + + — + — +— + — + — + + 

| Physical Longword Address of the PCB I 0 0 1 :PCBB 
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Figure 2-38: Process Control Block 



31 30 29 28|27 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12 | 11 10 09 08107 06 05 04|03 02 01 00 



+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I KSP | :PCB 

+ 1- — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| ESP I +4 

+ + + + + + + + + — + +. H + + + + + + + + — + + + + + — + + + + + + — + + 

I SSP I +8 

+ h + + + + + + + — + H + + — + + + + + + + + + + + + + + + + + + — + + 

I USP I +12 

+ + + + +--+ + — + + — + H + + — + + + + + + + + + + + — + — + + + + + + — + + 

I RO 1+16 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| Rl I +20 

+ + + + + + + — + + + -| + H + + + — + + + + — + + + + — H + + + + + + — + + 

| R2 1+2 4 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| R3 1+2 8 

+ — + — + — + — + — + — + — + — + — + — H + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I R4 I +32 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| R5 1+3 6 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| R6 I +40 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| R7 | +44 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I R8 I +48 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| R9 I +52 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| R10 I +56 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +— + — + 

| Rll I +60 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I AP(R12) I +64 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| FP(R13) I +68 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+ — + 

| PC I +72 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| PSL i +7 6 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I P0BR I +80 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I 0 0 0 0 0| ASTLVL | 0 0| P0LR I +84 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

i P1BR I +88 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +-- + — + — + — + — + — + — + — + — + — + 

IPME 0000000001 P1LR I +92 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +_- + — + 



31 30 29 28127 26 25 24|23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
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2.12 Processor Registers 

The processor registers that are implemented by the NVAX CPU chip, and those that are required 
of the system environment, are logically divided into five groups, as follows: 

• Normal — Those IPRs that address individual registers in the NVAX CPU chip or system 
environment. 

• Bcache tag IPRs — The read-write block of IPRs that allow direct access to the Bcache tags. 

• Bcache deallocate IPRs — The write-only block of IPRs by which a Bcache block may be 
deallocated. 

• Pcache tag IPRs — The read-write block of IPRs that allow direct access to the Pcache tags. 

• Pcache data parity IPRs — The read-write block of IPRs that allow direct access to the Pcache 
data parity bits. 

Each group of IPRs is distinguished by a particular pattern of bits in the IPR address, as shown 
in Figure 2-39. 

Figure 2-39: IPR Address Space Decoding 



31 30 25 25127 26 25 24|23 22 21 20|19 16 17 16115 14 13 12 111 10 OS 08|07 06 05 04103 02 01 00 



+ * — + + * !■ * + + + + + + + +--+ + + + + + + -i + + + + + + A + + + 

Bcache Tag IPR Address 

31 30 29 28127 26 25 24|23 22 21 20|1S 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04|03 02 01 00 
I SBZ I 1| 0| 0| x| Bcache Tag Index 1 SBZ ! 

+ * + » T -r + + + + + + + + + + + + + + A- + + + + + + + + + + ■+• + 

Bcache Deallocate IPR Address 

31 30 29 28(27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08107 06 05 04 | 03 02 01 00 
+ — + 1_. .+-- -j +—+__+ — + 1 +_-+__+_.. +__+ 

I SBZ I 1 1 01 1 1 x I Bcache Tag Deallocate Index i SBZ I 

+ + + + + + + H + + + + + H + + A + A + A + + + A A + A + A + A + 

Pcache Tag IPR Address 

31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04103 02 01 00 

A + 1 + A <r t~A—+ -4 H H \~ -+ h +--+ + H H H + + h— H (—+—+--+«—+-—+ +-- + 

I SBZ I II 1| 0| SBZ I I Pcache Tag Index I SBZ I 

+ — + — + — + — h + — + + + — + + — H + h + + — + + n + + — + — H + — -I H + — H + H + + + 

I 

Pcache Set Select (0-left f 1-right) -+ 
Pcache Data Parity IPR Address 
31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04 | 03 02 01 00 

+~+ + + + + h H h -i + H h -( h + H t- + -I H + A h +--+—-+ A + A h A + 

I SBZ I 1| 1| 1| SBZ || Pcache Tag Index I | SBZ I 

+ + A + A + + + + + + H h +~+ + A + + + + + + + A A + + + A + *l + 

I I 

Pcache Set Select (0-left, 1-right) -+ Subblock select + 
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The numeric range for each of the four groups is shown in Table 2-17. 



Table 2-17: IPR Address Space Decoding 



IPR 



Address 



IPR Group 



Mnemonic 2 Range (hex) 



Contents 



Normal 
Bcache Tag 

Bcache Deallocate 

Pcache Tag 



BCTAG 

BCFLUSH 

PCIAG 



Pcache Data Parity PCDAP 



0OOO0O00..0O0000FF 1 
01000000..011FFFE0 1 

01400000..015FFFE0 1 

01800000..01801FE0 1 

01C00000..01C01FF8 1 



256 individual IPRs. 

64k Bcache tag IPRs, each separated by 20(hex) 
from the previous one. 

64k Bcache tag deallocate IPRs, each separated 
by 20(hex) from the previous one. 

256 Pcache tag IPRs, 128 for each Pcache set, 
each separated by 20(hex) from the previous 
one. 

1024 Pcache data parity IPRs, 512 for each 
Pcache set, each separated by 8(hex) from the 
previous one. 



1 Unused fields in the EPR addresses for these groups should be zero. Neither hardware nor microcode detects and faults on 
an address in which these bits are non-zero. Although non-contiguous address ranges are shown for these groups, the entire 
IPR address space maps into one of the these groups. If these fields are non-zero, the operation of the CPU is UNDEFINED. 

2 The mnemonic is for the first IPR in the block 



NOTE 

The address ranges shown above are those used by the programmer. When processing 
normal IPRs, the microcode shifts the IPR number left by 2 bits for use as an IPR 
command address. This positions the IPR number to bits <9:2> and modifies the 
address range as seen by the hardware to 0..3FC, with bits <1:0>=00. No shifting 
is performed for the other groups of IPR addresses. 

Because of the sparse addressing used for IPRs in groups other than the normal group, valid IPR 
addresses are not separated by one. Rather, valid IPR addresses are separated by either 8 or 
20(hex). For example, the IPR address for Bcache tag 0 is 01000000 (hex), and the IPR address 
for Bcache tag 1 is 01000020 (hex). In this group, bits <4:0> of the IPR address are ignored, so 
IPR numbers 01000001 through 0100001F all address Bcache tag 0. Similarly, the IPR address 
for the first subblock of Pcache data parity is 01C00000 (hex), and the IPR address for the second 
subblock of Pcache data parity is 01C00008 (hex). 

Processor registers in all groups except the normal group are processed entirely by the NVAX 
CPU chip and will never appear on the NDAL. This is also true for a number of the IPRs in 
the normal group. IPRs in the normal group that are not processed by the NVAX CPU chip are 
converted into I/O space references and passed to the system environment via a read or write 
command on the NDAL. 

Each of the 256 possible IPRs in the normal group are of longword length, so a 1KB block of I/O 
space is required to convert each possible IPR to a unique I/O space longword. This block starts 
at address E 1000000 (hex). Conversion of an IPR address to an I/O space address in this block 
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is done by shifting the IPR address left into bits <9:2>, filling bits <1:0> with zeros, and merging 
in the base address of the block. This can be expressed by the equation 

IO ADDRESS = £1000000 + {IPR NUMBER * 4) 

The actual hardware implementation of this is different in that the IPR number is shifted left by 
2 bits, and bits <31:30,24> are set. There is no multiply or add done as one might conclude from 
the equation. 

Because many of the 256 possible IPRs in the normal group are processed entirely by the NVAX 
CPU chip, the corresponding I/O space location in the 1KB block is never referenced as a result of 
an MTPR/MFPR to or from these IPRs. However, note that a programmer can indeed reference 
these locations via an explicit I/O space reference with, e.g., MOVL. References to this block of I/O 
space locations with instructions other than MTPR/MFPR may result in UNDEFINED behavior. 

The processor registers implemented by the NVAX CPU are are shown in Table 2-18. 

NOTE 

Many of the processor registers listed in Table 2-18 are used internally by the 
microcode during normal operation of the CPU, and are not intended to be referenced 
by software except during test or diagnosis of the system. These registers are nagged 
with the notation "Testability and diagnostic use only, not for software use in normal 
operation". References by software to these registers during normal operation can 
cause UNDEFINED behavior of the CPU. 
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Table 2-18: Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Im.pl 


Cat 


I/O Add! 


Kernel Stack Pointer 


KSP 


0 


0 


RW 


NVAX 


1-1 




Executive Stack Pointer 


ESP 


1 


1 


RW 


NVAX 


1-1 




Supervisor Stack Pointer 


SSP 


2 


2 


RW 


NVAX 


1-1 




User Stack Pointer 


USP 


3 


3 


RW 


NVAX 


1-1 




Interrupt Stack Pointer 


ISP 


4 


4 


RW 


NVAX 


1-1 




Reserved 




5 


5 






3 


E100001' 


Reserved 




6 


6 






3 


E100001J 


Reserved 




7 


7 






3 


E10000K 


PO Base Register 


POBR 


8 


8 


RW 


NVAX 


1-2 




PO Length Register 


POLR 


9 


9 


RW 


NVAX 


1-2 




PI Base Register 


P1BR 


10 


A 


RW 


NVAX 


1-2 




PI Length Register 


P1LR 


11 


B 


RW 


NVAX 


1-2 




System Base Register 


SBR 


12 


C 


RW 


NVAX 


1-2 




System Length Register 


SLR 


13 


D 


RW 


NVAX 


1-2 




CPU Identification 1 


CPUID 


14 


E 


RW 


NVAX 


2-1 




Reserved 




15 


F 






3 


E100003( 


Process Control Block Base 


PCBB 


16 


10 


RW 


NVAX 


1-1 




System Control Block Base 


SCBB 


17 


11 


RW 


NVAX 


1-1 




Interrupt Priority Level 1 


IPL 


18 


12 


RW 


NVAX 


1-1 




AST Level 1 


ASTLVL 


19 


13 


RW 


NVAX 


1-1 




Software Interrupt Request Register 


SIRR 


20 


14 


W 


NVAX 


1-1 




Software Interrupt Summary Register 1 


SISR 


21 


15 


RW 


NVAX 


1-1 




Reserved 




22 


16 






3 


E 1000058 


Reserved 




23 


17 






3 


E100005C 


Interval Counter Control/Status 1,2 


ICCS 


24 


18 


RW 


NVAX 


2-7 


E 1000060 


Next Interval Count 


NICR 


25 


19 


W 


System 


3-7 


E 1000064 


Interval Count 


ICR 


26 


1A 


R 


System 


3-7 


E1000068 


Time of Year Register 


TODR 


27 


IB 


RW 


System 


2-3 


E100006C 


Console Storage Receiver Status 


CSRS 


28 


1C 


RW 


System 


2-3 


E1000070 


Console Storage Receiver Data 


CSRD 


29 


ID 


R 


System 


2-3 


E 1000074 


Console Storage Transmitter Status 


CSTS 


30 


IE 


RW 


System 


2-3 


E1000078 


Console Storage Transmitter Data 


CSTD 


31 


IF 


W 


System 


2-3 


E100007C 


Console Receiver Control/Status 


RXCS 


32 


20 


RW 


System 


2-3 


E 1000080 



1 Initialized on reset 

2 Subset or full implementation depending on ECR control bit 
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Table 2-18 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Impl 


Cat 


I/O Address 


Console Receiver Data Buffer 


RXDB 


33 


21 


R 


System 


2-3 


E1000084 


Console Transmitter Control/Status 


TXCS 


34 


22 


RW 


System 


2-3 


E1000088 


Console Transmitter Data Buffer 


TXDB 


35 


23 


W 


System 


2-3 


E100008C 


Reserved 




36 


24 






3 


E1000090 


Reserved 




37 


25 






3 


E1000094 


Machine Check Error Register 


MCESR 


38 


26 


W 


NVAX 


2-1 




Reserved 




39 


27 






3 


E100009C 


Reserved 




40 


28 






3 


E10000AO 


Reserved 




41 


29 






3 


E10000A4 


Console Saved PC 


SAVPC 


42 


2A 


R 


NVAX 


2-1 




Console Saved PSL 


SAVPSL 


43 


2B 


R 


NVAX 


2-1 




Reserved 




44 


2C 






3 


E10000BO 


Reserved 




45 


2D 






3 


E10000B4 


Reserved 




46 


2E 






3 


E10000B8 


Reserved 




47 


2F 






3 


E10000BC 


Reserved 




48 


30 






3 


E10000C0 


Reserved 




49 


31 






3 


E10000C4 


Reserved 




50 


32 






3 


E10000C8 


Reserved 




51 


33 






3 


E10000CC 


Reserved 




52 


34 






3 


E10000DO 


Reserved 




53 


35 






3 


E10000D4 


Reserved 




54 


36 






3 


E10000D8 


I/O System Reset Register 


IORESET 


55 


37 


W 


System 


2-3 


E10000DC 


Memory Management Enable 1 


MAPEN 


56 


38 


RW 


NVAX 


1-2 




Translation Buffer Invalidate All 


TBIA 


57 


39 


W 


NVAX 


1-1 




Translation Buffer Invalidate Single 


TBIS 


58 


3A 


W 


NVAX 


1-1 




Reserved 




59 


3B 






3 


E10000EC 


Reserved 




60 


3C 






3 


E10000F0 


Performance Monitor Enable 1 


PME 


61 


3D 


RW 


NVAX 


2-1 




System Identification 


SID 


62 


3E 


R 


NVAX 


2-1 


1 


Translation Buffer Check 


TBCHK 


63 


3F 


W 


NVAX 


1-1 





1 Initialized on reset 
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Table 2-18 (Cont.): Processor Registers 

Number 



Registrar Name 


Mnemonic (Dec) 


(Hex) 


Type 


Impl 


Cat 


I/OAddre 


IPL 14 Interrupt ACK S 


IAK14 


64 


40 


R 


System 


2-3 


E1000100 


IPL 15 Interrupt ACK S 


IAK15 


65 


41 


R 


System 


2-3 


E1000104 


IPL 16 Interrupt ACK 3 


IAK16 


66 


42 


R 


System 


2-3 


E1000108 


IPL 17 Interrupt ACK 3 


IAK17 


67 


43 


R 


System 


2-3 


E100010C 


Clear Write Buffer 3 


CWB 


68 


44 


RW 


System 


2-3 


E1000110 


Reserved 




69 


45 






3 


E1000114 


Reserved 




70 


46 






3 


E1000118 


Reserved 




71 


47 






3 


E100011C 


Reserved 




72 


48 






3 


E1000120 


Reserved 




73 


49 






3 


E1000124 


Reserved 




74 


4A 






3 


E1000128 


Reserved 




75 


4B 






3 


E100012C 


Reserved 




76 


4C 






3 


E1000130 


Reserved 




77 


4D 






3 


E1000134 


Reserved 




78 


4E 






3 


E1000138 


Reserved 




79 


4F 






3 


E100013C 


Reserved 




80 


50 






3 


E1000140 


Reserved 




81 


51 






3 


E1000144 


Reserved 




82 


52 






3 


E1000148 


Reserved 




83 


53 






3 


E100014C 


Reserved 




84 


54 






3 


E1000150 


Reserved 




85 


55 






3 


E1000154 


Reserved 




86 


56 






3 


E1000158 


Reserved 




87 


57 






3 


E100015C 


Reserved 




88 


58 






3 


E1000160 


Reserved 




89 


59 






3 


E1000164 


Reserved 




90 


5A 






3 


E1000168 


Reserved 




91 


5B 






3 


E100016C 


Reserved 




92 


5C 






3 


E1000170 


Reserved 




93 


5D 






3 


E1000174 


Reserved 




94 


5E 






3 


E1000178 


Reserved 




95 


5F 






3 


E100017C 



3 Testability and diagnostic use only; not for software use in normal operation 
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Table 2-18 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


lmpl 


Cat 


I/O Address 


Reserved 




96 


60 






3 


E1000180 


Reserved 




97 


61 






3 


E1000184 


Reserved 




98 


62 






3 


E1000188 


Reserved 




99 


63 






3 


E100018C 


Reserved for VM 




100 


64 






3 


E1000190 


Reserved for VM 




101 


65 






3 


E1000194 


Reserved for VM 




102 


66 






3 


E1000198 


Reserved 




103 


67 






3 


E100019C 


Reserved 




104 


68 






3 


E10001A0 


Reserved 




105 


69 






3 


E10001A4 


Reserved 




106 


6A 






3 


E10001A8 


Reserved 




107 


6B 






3 


E10001AC 


Reserved 




108 


6C 






3 


E10001B0 


Reserved 




109 


6D 






3 


E10001B4 


Reserved 




110 


6E 






3 


E10001B8 


Reserved 




111 


6F 






3 


E10001BC 


Reserved 




112 


70 






3 


E10001C0 


Reserved 




113 


71 






3 


E10001C4 


Reserved 




114 


72 






3 


E10001C8 


Reserved 




115 


73 






3 


E10001CC 


Reserved 




116 


74 






3 


E10001D0 


Reserved 




117 


75 






3 


E10001D4 


Reserved 




118 


76 






3 


E10001D8 


Reserved 




119 


77 






3 


E10001DC 


Reserved for Ebox 




120 


78 






2-6 


E10001E0 


Reserved for Ebox 




121 


79 






2-6 


E10001E4 


Interrupt System Status Register 3 


INTSYS 


122 


7A 


RW 


NVAX 


2-1 




Performance Monitoring Facility Count 


PMFCNT 


123 


7B 


RW 


NVAX 


2-1 




Patchable Control Store Control Register 8 


PCSCR 


124 


7C 


RW 


NVAX 


2-1 




Eboz Control Register 


ECR 


125 


7D 


RW 


NVAX 


2-1 




Mbox TB Tag Fill 8 


MTBTAG. 


126 


7E 


W 


NVAX 


2-1 




Mbox TB PTE Fill 8 


MTBPTE 


127 


7F 


W 


NVAX 


2-1 





3 Testability and diagnostic use only; not for software use in normal operation 
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Table 2-18 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Lm.pl 


Cat 


I/OAddre 


Reserved for Vectors 




128 


80 






3 


E1000230 


Reserved for Vectors 




129 


81 






3 


E1000230 


Reserved for Vectors 




130 


82 






3 


E1000230 


Reserved for Vectors 




131 


83 






3 


E1000230 


Reserved for Vectors 




132 


84 






3 


E1000230 


Reserved for Vectors 




133 


85 






3 


E1000230 


Reserved for Vectors 




134 


86 






3 


E1000230 


Reserved for Vectors 




135 


87 






3 


E1000230 


Reserved for Vectors 




136 


88 






3 


E1000230 


Reserved for Vectors 




137 


89 






3 


E1000230 


Reserved for Vectors 




138 


8A 






3 


E1000230 


Reserved for Vectors 




139 


8B 






3 


E1000230 


Reserved for Vectors 




140 


8C 






3 


E1000230 


Reserved for Vectors 




141 


8D 






3 


E 1000234 


Reserved for Vectors 




142 


8E 






3 


E1000238 


Reserved for Vectors 




143 


8F 






3 


E100023C 


Vector Processor Status Register 


VPSR 


144 


90 


RW 


Vector 


3 


E1000240 


Vector Arithmetic Exception Register 


VAER 


145 


91 


R 


Vector 


3 


E1000244 


Vector Memory Activity Register 


VMAC 


146 


92 


R 


Vector 


3 


E 1000248 


Vector Trans. Buffer Invalidate All 


T7TTTT1T A 

VTBIA 


147 


93 


W 


Vector 


3 


E100024C 


Reserved for Vectors 




148 


94 






3 


E1000250 


Reserved for Vectors 




149 


95 






3 


E1000254 


Reserved for Vectors 




150 


96 






3 


E1000258 


Reserved for Vectors 




151 


97 






3 


E100025C 


Reserved for Vectors 




152 


98 






3 


E1000260 


Reserved for Vectors 




153 


99 






3 


E 1000264 


Reserved for Vectors 




154 


9A 






3 


E1000268 


Reserved for Vectors 




155 


9B . 






3 


E100026C 


Reserved for Vectors 




156 


9C 






3 


E1000270 


Reserved for Vectors 




157 


9D 






3 


E1000274 


Reserved for Vectors 




158 


9E 






3 


E1000278 


Reserved for Vectors 




159 


9F 






3 


E100027C 
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Table 2-18 (Cont.): Processor Registers 

Number 

Register Name Mnemonic (Dec) (Hex) Type Impl Cat I/O Address 



Cbox Control Register 


CCTL 


160 


AO 


RW 


NVAX 


2-5 


Reserved for Cbox 




161 


Al 




NVAX 


2-6 


Bcache Data ECC 


BCDECC 


162 


A2 


W 


NVAX 


2-5 


Bcache Error lag Status 


BCETSTS 


163 


A3 


RW 


NVAX 


2-5 


Bcache Error Tag Index 


BCETIDX 


164 


A4 


R 


NVAX 


2-5 


Bcache Error Tag 


BCETAG 


165 


A5 


R 


NVAX 


2-5 


Bcache Error Data Status 


BCEDSTS 


166 


A6 


RW 


NVAX 


2-5 


Bcache Error Data Index 


BCEDIDX 


167 


A7 


R 


NVAX 


2-5 


Bcache Error ECC 


BCEDECC 


168 


A8 


R 


NVAX 


2-5 


Reserved for Cbox 




169 


A9 




NVAX 


2-6 


Reserved for Cbox 




170 


AA 




NVAX 


2-6 


Fill Error Address 


CEFADR 


171 


AB 


R 


NVAX 


2-5 


Fill Error Status 


CEFSTS 


172 


AC 


RW 


NVAX 


2-5 


Reserved for Cbox 




173 


AD 




NVAX 


2-6 


NDAL Error Status 


NESTS 


174 


AE 


RW 


NVAX 


2-5 


Reserved for Cbox 




175 


AF 




NVAX 


2-6 


NDAL Error Output Address 


NEOADR 


176 


B0 


R 


NVAX 


2-5 


Reserved for Cbox 




177 


Bl 




NVAX 


2-6 


NDAL Error Output Command 


NEOCMD 


178 


B2 


R 


NVAX 


2-5 


Reserved for Cbox 




179 


B3 




NVAX 


2-6 


NDAL Error Data High 


NEDATHI 


180 


B4 


R 


NVAX 


2-5 






lOl 






an Vaa. 




NDAL Error Data Low 


NEDATLO 


182 


B6 


R 


NVAX 


2-5 


Reserved for Cbox 




183 


B7 




NVAX 


2-6 


NDAL Error Input Command 


NEICMD 


184 


B8 


R 


NVAX 


2-5 


Reserved for Cbox 




185 


B9 




NVAX 


2-6 


Reserved for Cbox 




186 


BA 




NVAX 


2-6 


Reserved for Cbox 




187 


BB 




NVAX 


2-6 


Reserved for Cbox 




188 


BC 




NVAX 


2-6 


Reserved for Cbox 




189 


BD 




NVAX 


2-6 


Reserved for Cbox 




190 


BE 




NVAX 


2-6 


Reserved for Cbox 




191 


BF 




NVAX 


2-6 
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Table 2-18 (Cont.): Processor Registers 

Number 



Register Name 


Mnemonic (Dec) 


(Hex) 


Type 


Impl 


Cat 


I/OAddn 


Reserved 




192 


CO 






3 


E10O0300 


Reserved 




193 


CI 






3 


E1000304 


Reserved 




194 


C2 






3 


E1000308 


Reserved 




195 


C3 






3 


E10O030C 


Reserved 




196 


C4 






3 


E1000310 


Reserved 




197 


C5 






3 


E1000314 


Reserved 




198 


C6 






3 


E1000318 


Reserved 




199 


C7 






3 


E1OO031C 


Reserved 




200 


C8 






3 


E1000320 


Reserved 




201 


C9 






3 


E1000324 


Reserved 




202 


CA 






3 


E1000328 


Reserved 




203 


CB 






3 


E100032C 


Reserved 




204 


CC 






3 


E1000330 


Reserved 




20o 


CD 






3 


E1000334 


Reserved 




206 


CE 






3 


E1000338 


Reserved 




207 


CF 






3 


E100033C 


VIC Memory Address Register 


VMAR 


208 


DO 


RW 


NVAX 


2-5 




VIC Tag Register 


VTAG 


209 


Dl 


RW 


NVAX 


2-5 




VIC Data Register 


VDATA 


210 


D2 


RW 


NVAX 


2-5 




Ibox Control and Status Register 


ICSR 


211 


D3 


RW 


NVAX 


2-5 




Ibox Branch Prediction Control Register 8 


BPCR 


212 


D4 


RW 


NVAX 


2-5 




Reserved for Ibox 




213 


D5 




NVAX 


2-6 




Ibox Backup PC 4 


BPC 


214 


D6 


R 


NVAX 


2-5 




Ibox Backup PC with. RLOG Unwind* 


BPCUNW 


215 


D7 


R 


NVAX 


2-5 




Reserved for Ibox 




216 


D8 




NVAX 


2-6 




Reserved for Ibox 




217 


D9 




NVAX 


2-6 




Reserved for Ibox 




218 


DA 




NVAX 


2-6 




Reserved for Ibox 




219 


DB 




NVAX 


2-6 




Reserved for Ibox 




220 


DC 




NVAX 


2-6 




Reserved for Ibox 




221 


DD 




NVAX 


2-6 




Reserved for Ibox 




222 


DE 




NVAX 


2-6 




Reserved for Ibox 




223 


DF 




NVAX 


2-6 





3 Testability and diagnostic use only, not for software use in normal operation 

4 Chip test use only; not for software use 
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Table 2-18 (Cont): Processor Registers 



Number 





Mnemonic (Dec) 


Vnr.1/ 


xype 


JLUXLPJl 


Cat 


Mbox P0 Base Register 8 


MPOBR 


224 


E0 


RW 


NVAX 


2-5 


Mbox P0 Length Register 3 


MPOLR 


225 


El 


RW 


NVAX 


2-5 


Mbox PI Base Register 8 


MP1BR 


226 


E2 


RW 


NVAX 


2-5 


Mbox PI Length Register 3 


MP1LR 


227 


E3 


RW 


NVAX 


2-5 


Mbox System Base Register 3 v 


MSBR 


228 


E4 


RW 


NVAX 


2-5 


Mbox System Length Register 8 


MSLR 


229 


E5 


RW 


NVAX 


2-5 


Mbox Memory Management Enable 3 


MMAPEN 


230 


E6 


RW 


NVAX 


2-5 


Mbox Physical Address Mode 


PAMODE 


231 


E7 


RW 


NVAX 


2-5 


Mbox MME Address 


MMEADR 


232 


E8 


R 


NVAX 


2-5 


Mbox MME PTE Address 


MMEPTE 


233 


E9 


R 


NVAX 


2-5 


Mbox MME Status 


MMESTS 


234 


EA 


R 


NVAX 


2-5 


Reserved for Mbox 




235 


EB 




NVAX 


2-6 


Mbox TB Parity Address 


TBADR 


236 


EC 


R 


NVAX 


2-5 


Mbox TB Parity Status 


TBSTS 


237 


ED 


RW 


NVAX 


2-5 


Reserved for Mbox 




238 


EE 




NVAX 


2-6 


Reserved for Mbox 




239 


EF 




NVAX 


2-6 


Reserved for Mbox 




240 


F0 




NVAX 


2-6 


Reserved for Mbox 




241 


Fl 




NVAX 


2-6 


Mbox Pcache Parity Address 


PCADR 


242 


F2 


R 


NVAX 


2-5 


Reserved for Mbox 




243 


F3 




NVAX 


2-6 


Mbox Pcache Status 


POSTS 


244 


F4 


RW 


NVAX 


2-5 


Reserved for Mbox 




245 


F5 




NVAX 


2-6 


Reserved for Mbox 




246 


F6 




NVAX 


2-6 


Reserved for Mbox 




247 


F7 




NVAX 


2-6 


Mbox Pcache Control 


PCCTL 


248 


F8 


RW 


NVAX 


2-5 


Reserved for Mbox 




249 


F9 




NVAX 


2-6 


Reserved for Mbox 




250 


FA 




NVAX 


2-6 


Reserved for Mbox 




251 


FB 




NVAX 


2-6 


Reserved for Mbox 




252 


FC 




NVAX 


2-6 


Reserved for Mbox 




253 


FD 




NVAX 


2-6 


Reserved for Mbox 




254 


FE 




NVAX 


2-6 


Reserved for Mbox 




255 


FF 




NVAX 


2-6 



I/O Address 



3 Testability and diagnostic use only; not for software use in normal operation 
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Table 2-18 (Cont.): Processor Registers 

Number 

Register Name Mnemonic (Dec) (Hex) Type Im.pl Cat I/OAddre 

Unimplemented 100- 3 

00FFFFFF 

See Table 2-17 01000000- 2 



Type: 

R = Read-only register 
RW = Read-write register 
W = Write-only register 

Impl(emented): 

NVAX = Implemented in the NVAX CPU chip 

System = Implemented in the system environment 

Vector = Implemented in the optional vector unit or its NDAL interface 

Cat(egory), class-subclass, where: 
class is one of: 

1 = Implemented as per DEC standard 032 

2 = NVAX-specific implementation which is unique or different from the DEC standard 032 implementation 

3 = Not implemented internally; converted to I/O space read or write and passed to system environment 

subclass is one of: 

1 - Processed as appropriate by Ebox microcode 

2 = Converted to Mbox IPR number and processed via internal IPR command 

3 = Processed by internal IPR command, then converted to I/O space read or write and passed to system environment 

4 = If virtual machine option is implemented, processed as in 1, otherwise as in 3 

5 = Processed by internal IPR command 

6 = May be block decoded; reference causes UNDEFINED behavior 

7 = Full interval timer may be implemented in the system environment. Subset IOCS is implemented in NVAX CPU chip 
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2.13 I/O space Addresses 

As noted above, processor registers that are not implemented on the NVAX CPU chip are 
converted to I/O space reads or writes. Most of these IPRs are optional and may be implemented 
or not, as dictated by the needs of the system environment. The I/O space registers that must be 
implemented by the system environment are shown in Table 2-19. 

Table 2-19: I/O Space Registers 

I/O Space 
Address 



(Hex) 


Type 


Definition 


E0040000 


RO 


Powerup boot ROM address from which the first instruction is fetched. 


E1000100 


RO 


Interrupt acknowledge for an IPL 14 (hex) interrupt requested via the 
IRQ_L<0> pin. 


E1000104 


RO 


Interrupt acknowledge for an IPL 15 (hex) interrupt requested via the 
IRQJL<1> pin. 


E 1000108 


RO 


Interrupt acknowledge for an IPL 16 (hex) interrupt requested via the 
IRQ__L<2> pin. 


E100010C 


RO 


Interrupt acknowledge for an IPL 17 (hex) interrupt requested via the 
IRQ_L<3> pin. 


E1000110 


RW 


Location which invokes a write buffer flush in the system environment. 
When this location is read, the CPU is waiting for confirmatioE that the 
flush has completed. The returned data is ignored. 
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2.14 Revision History 



Table 2-20: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


15-Dec-1989 


Update for second-pass release. 


Mike Uhler 


20-Jul-1990 


Update to reflect implementation. 


Mike Uhler 


04-Dec-1990 


Update after pass 1 PG. 
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Chapter 3 

NVAX Chip Interface 



3.1 Introduction 

The NVAX chip communicates through five interfaces: the NDAL (NVAX data-address lines), the I 
backup cache interface, the interrupt lines, the clocking interface, and the test interface. I 

This chapter begins by listing all the NVAX pins and giving a brief description of each. The rest of 
the chapter describes the NDAL protocol in detail. The other interfaces are described as follows: 
the backup cache interfaces in Chapter 13, the interrupt lines in Chapter 10, the test interface 
in Chapter 19, and the clocking interface in Chapter 17. I 

The NDAL is a 64-bit pended bidirectional bus which is used by the NVAX CPU to communicate 
with the system environment. The NDAL cycle time is three times longer than the NVAX CPU 
cycle time. The NVAX CPU cycle time is targeted to 14ns, making the NDAL cycle time 42 
ns. Binned CPU parts may run at 10ns, resulting in an NDAL cycle time of 30ns. The NDAL 
supports up to four (4) nodes with a maximum of one (1) NVAX CPU. In this spec, these four 
nodes are referred to as CPU (NVAX), I01_NODE, I02_NODE, and the memory interface. 

The NVAX CPU contains a writethrough primary cache and a writeback backup cache. The NDAL 
is designed to support the writeback cache and cache coherency in a multiprocessor system. 

NOTE 

IMPORTANT INFORMATION REGARDING THE NVAX CHIP INTERFACE IS ALSO 
CONTAINED IN Chapter 10 (The Interrupt Section), Chapter 13 (The Cbox), Chapter 17 
(Chip Clocking), AND Chapter 19 (Testability Micro-Architecture). THE READER 
MUST CONSULT THOSE CHAPTERS IN ORDER TO OBTAIN COMPLETE INFORMATION. 

3.2 NVAX CPU pinout 

The NVAX CPU chip contains the pins listed in Table 3—1. Following the table, each pin is 
described in more detail. 
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Table 3-1: NVAX CPU plnout 













Running 


Pin 


I/O 1 


Type 2 


Function 


Number 


Total 


NDAL SIGNALS (80 total) 3 


P%CPU_REQ_L 


o 


SS,1D1R 


NVAX Request 




1 


P%CPU_HOLD_L 


o 


SS,1D1R 


NVAX Hold 




2 


P%CPU_SUPPRESS_L 


o 


SS,1D1R 


NVAX Suppress 




3 


P%CPU_GRANT_L 


I 


SS,1D1R 


NVAX Grant 




4 


P%CPU_WB_ONLY_L 


I 


SS,1D1R 


Writeback Only 




5 


P%NDAL_H<63.-0> 


IO 


T,4D4R 


Data/Address Lines 


64 


69 


P%CMD_H<3:0> 


IO 


T,4D4R 


Command 


4 


73 


P%ID_H<2K)> 


IO 


T,4D4R 


Node Identification Lines 


3 


76 


p%PAKrry_H<2K» 


IO 


T.4D4R 


NDAL Parity 


3 


79 


P%ACK_L 


IO 


0D,4I>4R 


Acknowledge 


1 


80 


CLOCKS (15 total) 4 


P%OSC_H 


I 


SS,1D1R 


Oscillator, High Asserted 


1 


81 


P%OSC_L 


I 


SS,1D1R 


Oscillator, Low Asserted 


1 


82 


P%OSC_TCl_H 


I 


SS,1D1R 


lest Clock/Timeout Clock 


1 


83 


P%OSC_TC2_H 


I 


SS,1D1R 


Test Clock 


1 


84 


P%OSC_TEST_H 


I 


SS,1D1R 


Test Clock Control 


1 


85 


P%PHI12_OUT_H 


0 


SS,1D4R 


NDAL PHI12, Driven 


1 


86 


P%PHl23_OUT_H 


o 


SS,1D4R 


NDAL PHI23, Driven 


1 


87 


P%PHl34_OUT_H 


o 


SS,1D4R 


NDAL PHI34, Driven 


1 


88 


P%PHl41_OUTH 


0 


SS,1D4R 


NDAL PHI41, Driven 


1 


89 


P%PHI12_IN_H 




SS,1D4R 


NDAL PHI12, Received 


1 


90 


P%PHI23_IN_H 




SS,1D4R 


NDAL PHI23, Received 


1 


91 


P%PHI34_IN_H 




SS,1D4R 


NDAL PHI34, Received 


1 


92 


P%PHI41_IN_H 




SS,1D4R 


NDAL PHI41, Received 


1 


93 


P%ASYNC_RESET_L 




SS,1D1R 


Reset Input to NVAX 


1 


94 


P%SYS_RESET_L 


0 


SS,1D3R 


Reset Output to System 


1 


95 


INTERRUPT AND ERROR SIGNALS (10 total) 


s 






P%MACHINE_CHECK_H 


o 


SS,1D1R 


Machine Check 


1 


96 


P%IRQ_L<3K)> 




OD,3DlR 


Interrupt Request Lines 


4 


100 


P%H_ERR_L 




0D,3D1R 


Hard (unrecoverable) Error 


1 


101 


P%S_ERR_L 




OD,3DlR 


Soft (recoverable) Error 


1 


102 


P%INT_TIML 




SS,1D1R 


Interval Timer Request 


1 


103 


P%PWRFL_L 




SS,1D1R 


Power Fail 


1 


104 


P%HALT_L 




SS,1D1R 


Halt 


1 


105 
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Running 



Pin 


I/O 


Type 


Function 


Number 


Total 


BACKUP CACHE SIGNALS (133 total) 6 


P%TSJNDEX_H<20:5> 


O 


SS,1D6R 


Tag Store Index Lines 


16 


121 


P%TS_OE_L 


0 


SS,1D6K 


Tag Store Output Enable 


1 


122 


P%TS_WE_L 


0 


SS,1D6R 


Tag Store Write Enable 


1 


123 


P%TSJEAG_H<31:17> 


IO 


T/7D7R 


Tag Store Tag 


15 


138 


P%TS_ECC_H<5:0> 


10 


T,7D7R 


Tag Store ECC 


6 


144 


P%TS_OWNED_H 


10 


T,7D7R 


Tag Store Owned Bit 


1 


145 


P%TS_VALJJ)_H 


10 


T,7D7R 


Tag Store Valid Bit 


1 


146 


P%DR_INDEX_H<20:3> 


o 


SS,1D18R 


Data RAM Index Lines 


18 


164 


P%DR_OE_L 


0 


SS,1D18R 


Data RAM Output Enable 


1 


165 


P%DR_WE_L 


0 


SS,1D18R 


Data RAM Write Enable 


1 


166 


P%DR_DATA_H<63:0> 


10 


T,19D19R 


Data RAM Data Lines 


64 


230 


P%DR_ECC_H<7:0> 


10 


T,19D19R 


Data RAM ECC 


8 


238 


TEST SIGNALS (23 total) 7 


P%TEST_DATA_H 


I 


SS,1D1R 


Test data input for microcode 
use. 


1 


239 


P%TEST_STROBE_H 


I 


SS,1D1R 


Test strobe for microcode use. 


1 


240 


P%DISABLE_OUT_L 


I 


SS,1D1R 


Disable NVAX Outputs 


1 


241 


P%TEMP_H 


0 


SS,1D1R 


NVAX Temperature Output 


1 


242 


P%TMS_H 


I 


SS,1D1R 


JTAG Test Mode Select 


1 


243 


P%TCK_H 


I 


SS,1D1R 


JTAG Test Clock 


1 


244 


P%TDI_H 


I 


SS,1D1R 


JTAG Serial Test Data Input 


1 


245 


P%TDO_H 


o 


SS,1D2R 


JTAG Serial Test Data Output 


1 


246 


P%PP_CMD_H<2K)> 


I 


SS,1D1R 


Parallel Test Port Command 


3 


249 


P%PP_D ATA_H<11 K>> 


o 


T,2D2R 


Parallel Test Port Data 


12 


261 



1 Indicates whether the pin is an NVAX CPU Input, Output, or Input/Output pin. 

2 Single Source is denoted by SS, Tristate by T, Open Drain by OD; #D indicates the ma-rimi-i-m number of drivers and #R 
indicates the TnaTimivm number of receivers expected on the board. 



s The8e pins are discussed in detail in this chapter. 

4 These pins are discussed in detail in Chapter 17 

5 These pins are discussed in detail in Chapter 10 

6 These pins are discussed in detail in Chapter 13 

7 These pins are discussed in detail in Chapter 19 
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3.2.1 NDAL Signals and Timing 

The functionality of the NDAL pins is described in detail in Section 3.3. The timing of the pins 
is shown in Figure 3-1, and the AC specs are given in Table 3—2. 

NOTE 

The timing of the NDAL signals is given relative to the NDAL clocks which are received 
by NVAX: P%PHI12_INH, P%PHI23_IN_H, P%PHI34_IN_H, and P%PHI41_IN_ 
H. NVAX drivers were designed to meet this timing, taking the NDAL clock skew into 
account. (NDAL clock skew is covered in Chapter 17.) NVAX expects to receive signals 
which have been designed taking the clock skew into account; NVAX receivers account 
for no clock skew. 
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Figure 3-1: NDAL Pin Timing Relative to the NDAL CLOCKS 



P%XT> H<2:0> 
P%PARITY H<2:0> 
P%NDAL B<63:0> 
P%CMD I<3:0> 



P%ID_H<2:0> 
P%PARITY_B<2:0> 
P%NDAL H<63:0> 
P%CMD H<3:0> 



P%ACK L 



P%ACK L 



P4 



P%PHI12_1K_H _ 

P%PHI23_1N_H _ 

P%PEI34_IN_B ~~ 

P%PHI41 IN H f 



-NDAL CYCLE- 



Pl I P2 I P3 I P4 



E— 










— N DAL 




P4 






PI 


I P2 







P3 P4 



A* driven by NVAX CPU 
Driven from P%PEI12 IN E rising edge 
Released with P%PHI41_IN_E rising edg« 



xx>ooooooooooooooooc 



222 



As received by NVAX CPU 
Latch closes with P%PEI41_IN_E rising edge (latch open during phi23) 



SSSSSNSWN 



As culled low by NVAX CPU i sulled high chrough board oullur> resistor 
NVAX pulls low w/?%?EI23_IK_E rising; NVAX releases with P%PEI23_IN_E fal. 



>0OO0O000OOO0O0OOOO00O00OO0OCZ= 

As required by NVAX CPU 
Letch closes with P%PEI34_IN_B rising 
(latch open during phil2)~ :~ 



P%CPU_BOLD_L 
P%CPU_SUPPR£SS_L 
P%CPU REQ L 



P%CPU_WB ONLY L 
P%CPU GRANT L~ 



x>ooooooooc 



As driven by NVAX CPU : | 
Driven with P%PHI12_IN_E rising edga 



x»oooooooooooooooooooooooooc 



As required by NVAX CPU 
Latch closes with P%PEI41_IN_E rising edge 
(latch open during phi23)~ I 



ling 
dg« 
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Table 3-2: NDAL AC tinning specs 


Input Pin 


Setup Tune 1 


Hold Tune 


T>/>f XTT\ A T TT ^v. 


1 pnase to Jt*%±*±li4 _±l ft 


T~kryf TVTTT A 1 TXT TT TJ > Q_„2 


P%CMD_H<3:0> 






P%ID_H<2K)> 






P%PAEITY_H<2.-0> 


It 




P%ACK L 


0 ns to P%PHI34 IN H R 


P%PHIS4 IN HE+1 rthase 


P%CPTJ WB ONLY L 


0 ns to P3SPHI41 IN Tf R 


P%PHI41 IN HR+1 rthase 


P%CPU_GRANT_L 


M 




Output Pin 


Drive Time 


Tristate Tune 


JrVcIS D AL_H<o3 K)> 


TAC*T>TTY"tO TXT TT X) O 

r^vcir HI j !t_II>_±l ft + z phases 


Thry I>| II a t TXT TT TJ , t — V-.— — 

x*%r*J1141 — JI\_±1 XV + 1 pnase 


P%CMD_H<3:0> 






P%ED_H<2K)> 






P%PAEITY.H<2K)> 




" 


P%ACK_L 


P%PHI23_INH R + 1 phase Gow 
transition), P%PHI23_IN_H F + 3 
phases(high transition) 3 




P%CPU_HOLD_L 


P%PHI12_IN_H R + 1 phase 




P%CPU_SUPPRESS_L 






P%CPU_REQ_L 







1 R means tiie rising edge of the clock is used; F means the falling edge of the clock is used. 

2 The 2ns hold time requirement on the NDAL is as follows: the data does not have to be actively driven for this amount 
of time if the driver ensures that the values will be capacitively held on the bus for 2ns past the phi4 rising edge. 

S P%ACK_L is pulled up through a resistor in the system; the same must be done on the test load board. 



3.2.1 .1 P%CPU__RECLL 

NVAX asserts P%CPU_REQJL to request the NDAL for the following cycle. P%CPU_REQ_L 
is a unidirectional signal from NVAX to the arbiter. 
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3.2.1 .2 P%CPU_HOLD_L 

The NVAX CPU asserts P%CPU_HOLD_L in order to drive the NDAL on consecutive cycles. 

3.2.1 .3 P%CPU_SUPPRESS_L 

NVAX asserts P%CPU_SUPPKESS_L in order to suppress new NDAL transactions. While 
P%CPU_SUPPKESS_L is asserted, only fills and writebacks are allowed to proceed from non- 
CPU nodes. 

3.2.1 .4 P%CPU_GRANT_L 

P%CPU_GRANT_L is asserted to notify NVAX that it must drive the NDAL during the following 
cycle. 

3.2.1 .5 P%CPU_WB_ONLY_L 

When the system asserts P%CPUJWB_ONLYJL, NVAX only issues WDISOWN or NOP commands. 

3.2.1.6 P%NDAL_H<63:0> 

NVAX uses P%NDAL_H<63:0> to transfer address and data information to and from the system. 

3.2.1.7 P%CMD_H<3:0> 

The P%CMD_H<3:0> lines contain the NDAL command during any given cycle. NVAX drives 
and receives these lines. 

3.2.1.8 P%ID_H<2:0> 

NVAX drives and receives P%ED_H <2 :0> , which contain the node identification number for every 
cycle. These lines identify which node is driving the NDAL or which node is to receive the NDAL, 
depending upon the current command. 

3.2.1 .9 P%PARITY_H<2:0> 

NVAX drives and receives P%PARTIY_H<2:0>, which contains parity computed over P%NDAL_ 
H<63:0>, P%CMD_H<3:0> and P%IDJB<2:0> during every NDAL cycle. 

3.2.1.10 P%ACKJ_ 

NVAX asserts P%ACKJL when it has received a fill data cycle. NVAX receives P%ACK_L as an 
acknowledgement that its outgoing cycle was successfully received. It also receives P%ACKJL 
for cycles which it did not drive on the NDAL, as a way of detecting inconsistent parity errors. 
An inconsistent parity error is where NVAX detects a parity error on the NDAL and also notices 
that P%ACK_L was asserted for that cycle. 

P%ACK_L is an open drain signal which is pulled high (deasserted) by an external resistor on 
the board. 
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3.2.2 Clocking signals 

The NVAX CPU chip generates four two-phase clocks which are distributed to the system. These 
clocks are also distributed back to itself, which minimises skew between NVAX and the other 
chips on the NDAL. Each NDAL cycle is three CPU cycles long. 

The clocking signals are described in detail in Chapter 17. 

3.2.2.1 P%OSC_H, P%OSC_L 

P%OSC_H and P%OSC_L are complementary oscillator inputs to NVAX. They are used to gen- 
erate on-chip clocks and system clocks. When P%OSC_TEST_H is deasserted, P%OSC_H and 
P%OSC_L are used to generate NVAX clocks. 

3.2.2.2 P%OSCJTC1_H, P%OSC_TC2._H 

P%OSC_TCl_H and P%OSC_TC2_H are oscillator inputs to NVAX for use during testing only. 
When P9cOSC_TEST_H is asserted, P%OSC_TCl_H and P%OSC_TC2_H are used to generate 
NVA£ clocks. 

P%OSC_TCl_H and P%OSC_TC2_H are 90 degrees out of phase with each other, and areXOR'd 
internally to produce an internal clock which runs at twice the speed. This allows NVAX to run 
at full speed while the input clocks are running at half speed. 

P%OSC_TCl_H is also used as an input to the Ebox base timeout counter as an alternate clock 
for the timeout counter. Normally, the base counter is run from the internal NVAX clock; if the 
system designer wants to lengthen the timeout values used by NVAX, the base counter may be 
configured to run from P%OSC_TCl_H instead. P%OSC_TCl_H is synchronized to the internal 
NVAX clocks in order to be used for this purpose. 

3.2.2.3 P%OSC_TEST_H 

P%0 SCJTE ST_H is a control pin which determines which oscillator inputs are used by the clock 
generators. When P%OSC_TEST_H is deasserted, P%OSC_H and P%OSC_L are used; when 
P%OSCJTEST_H is asserted, P%OSC_TCl_H and P%OSC_TC2_H are used. 

Z.2JZA P%PHI12_OUT_H, P%PHl23_OUT_H, P%PHJ34_OUT_H, P%PHI41_OUT_H 

These two-phase overlapping clocks are driven from the NVAX chip to all nodes on the NDAL, 
including back to NVAX itself. 

3.2.2.5 P%PHI12_IN_H, P%PHI23JN_H, P%PHI34JN_H, P%PHW1_IN_H 

These NVAX pins are used to receive the NDAL clocks, which are driven from P%PHI12_OUT_ 
H, P%PHI23_OUT_H, P%PHI34_OUT_H, and P%PHI41_OUTja. 

3.2.2.6 P%ASYNC_RESET_L 

P%ASYNC_RESET_L is an asynchronous input to NVAX which is used to generate an internal 
reset signal as well as P%SYS_RESET_L. 
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3.2.2.7 P%SYS_RESET_L 

NVAX drives P%SYS_KESET_L to notify all NDAL receivers to reset. It is deasserted syn- 
chronously with the NDAL clocks. 

3.2.3 Interrupt and Error Signals 

The interrupt and error signals are described in detail in Chapter 10. 

3.2.3.1 P%MACHINE_CHECK_H 

The assertion of P%MACHENE_CHECK_H indicates that the CPU is in a machine check se- 
quence. This signal may be wired to an LED on the board. (The pin is not able to drive the LED 
directly) It will nicker during a normal machine check. If the CPU never comes out of machine 
check, the LED will stay lit and indicates to Field Service that the board needs to be replaced. 

3.2.3.2 P%IRQ_L<3:0> 

The P%IRQ_L<3:0> lines provide a general-purpose interrupt request facility to interrupt the 
NVAX CPU. These four external interrupt request lines correspond to interrupt requests at IPLs 
17, 16, 15, and 14 (hex). P%IRQJL<3> corresponds to IPL 17, P%IRQ_L<2> corresponds to IPL 
16, P%IRQ_L<1> corresponds to IPL 15, and P%IRQ_L<0> corresponds to IPL 14. These lines 
are level-sensitive, NOT edge sensitive. Once a node asserts its interrupt line, it should keep it 
asserted until NVAX services the request. 

P%EEtQ_L<3:0> are asynchronous inputs to NVAX and are not expected to operate with any fixed 
relationship to the NDAL timing. 

3.2.3.3 P%H_ERR_L 

P%HJERR_L is used to notify NVAX of an error condition in the system which has corrupted 
machine state. These errors usually cannot be corrected by any retry mechanism. 

If at all possible, NDAL errors should be reported using the transaction level error reporting 
mechanisms (not asserting P%ACK_L or using the Read Data Error command). If this is not 
possible, P%H_EKR_L or P%SJERR_L may be used. When P%HJEKR_L is asserted, NVAX 
will take a Hard Error Interrupt at IPL ID (hex). 

P%H_ERR_L is an asynchronous input to NVAX and is not expected to operate with any fixed 
relationship to the NDAL timing. 

3.2.3.4 P%S_ERR_L 

The assertion of P%S_ERRJL indicates that an error which did not affect instruction execu- 
tion has been detected in the system environment. For example, if an NDAL node uses the 
BADWDATA because of an uncorrectable error in its cache, it would also assert P%SJEKRJL to 
notify NVAX of the event. When it recognizes the assertion of P%S JERRJL, NVAX takes a Soft 
Error Interrupt at IPL 1A (hex). 

P%S_ERR_L is an asynchronous input to NVAX and is not expected to operate with any fixed 
relationship to the NDAL timing. 
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3.2.3.5 P%INT_TIM_L 

The assertion of P%INT_TTM_L indicates that the interval timer period has expired. 

P%INT_T1M_L is an asynchronous input to NVAX and is not expected to operate with any fixed 
relationship to the NDAL tuning. 

3.2.3.6 P%PWRFL_L 

The assertion of P%PWKFLJL informs the CPU of an impending power failure. 

P%PWKFLJL is an asynchronous input to NVAX and is not expected to operate with any fixed 
relationship to the NDAL timing. 

3.2.3.7 P%HALT__L 

The assertion of P%HALTJL causes the CPU to enter the console at IPL IF (hex) at the next 
macroinstruction boundary. 

P%HALT_L is an asynchronous input to NVAX and is not expected to operate with any fixed 
relationship to the NDAL timing. 

3.2.4 Cache interface signals 

These pins are described in detail in Chapter 13. The timing of the pins is shown in Figure 3-2. 

NOTE 

The timing of the Bcache interface signals is given relative to the INTERNAL NVAX 
clocks. 

3.2.4.1 P%TSJNDEX_H<20:5> 

P%TS_INDEX_H<20:5> drive the address lines of the backup cache tag RAMs, thus indexing 
into one row of the tag store. 

3.2.4.2 P%TS_OE_L 

This pin is connected to the output enable pins of the backup cache tag store RAMs. When NVAX 
asserts P%TS_OE_L, the RAMs are enabled to drive P%TS_TAG_H<31:17>, P%TS_VATJD_H, 
P%TS_OWNEDJB, and P%TSJECC_H<5:0>. 

3.24.3 P%TS_WE_L 

This pin is connected to the write enable pins of the backup cache tag store RAMs. When 
NVAX asserts P%TS_WEJL, the RAMs are enabled to write the information on P%TS_TAG_ 
H<31:17>, P%TS_VAL1D_H, P%TS_OWNED_H, and P%TS_ECC_H<5:0>, which NVAX drives 
when P%TS_WE_L is asserted. 
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Figure 3-2: Bcache Pin Timing Relative to INTERNAL NVAX Clocks (14ns system) 
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NOTE: All drive times are shown as simulated in the XNP board environment with typical (14ns) NVAX 
parts; drive times in other environments and with non- typical NVAX parts will differ. The diagram 
assumes a 14-ns NVAX cycle. 
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3.2.4.4 P%TS_TAG_H<31 :17> 

P%TS_TAGJK<31:17> carry the tag which is written to and read from the backup cache tag 
store. Each of these pads is built with an internal resistor so that if the tag bit is not used in a 
particular system, the pin value as seen by the Cbox is 0. For example, a machine which runs 
only in 30-bit mode does not need to connect P%TS_TAG_H<3 1 :29> to the backup cache. 

3.2.4.5 P%TS_ECC_H<5:0> 

P%TS_ECC_H<5:0 > carry the error correcting code which is written to and read from the backup 
cache tag store. 

3.2.4.6 P%TS_OWNED_H 

P%TS_OWNED_H carries the OWNED bit which is written to and read from the backup cache 
tag store. 

3.2.4.7 P%TS_VALID_H 

P%TS„VAT XD_H carries the VALID bit which is written to and read from the backup cache tag 
store. 

3.2.4.8 P%DRJNDEX_H<20:3> 

P%DR_INDEX_H<20:3> drive the address lines of the backup cache data RAMs, thus indexing 
into one row (one quadword) of the cache. 

3.2.4.9 P%DR_OE_L 

This pin is connected to the output enable pins of the backup cache data RAMs. When NVAX 
asserts P%DR_OEJL, the RAMs are enabled to drive P%DR_DATA_H<63:0> and P%DR_ECC_ 
H<7:0>. 

3.2.4.10 P%DR_WE__L 

This pin is connected to the write enable pins of the backup cache data RAMs. When NVAX asserts 
P%DR_WEJL, the RAMs are enabled to write the information on P%DRJDAIA_H<63:0> and 
P%DRJECCJH<7:0>, which NVAX drives when P%DR_WE_L is asserted. 

3.2.4.11 P%DR__DATA_H<63:0> 

P%DR_DATA_H<63:0> carry the cache data which is written to and read from the backup cache. 

3.2.4.12 P%DR_ECC_H<7:0> 

P%DRJECC_H<7:0 > carry the error correcting code which is written to and read from the 
backup cache data RAMs. 
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3.2.5 Test Pins 

These pins are covered in more detail in Chapter 19. 

3.2.5.1 P%TEST_DATA_H 

TEST_D ATA_H is an asynchronous input pin which may be used by microcode. It is pulled high 
internally so that if it is not used, it does not have to be connected on the board. 

3.2.5.2 P%TEST_STROBE_H 

TEST_STROBE_H is an asynchronous input pin which may be used by microcode. It is pulled 
high internally so that if it is not used, it does not have to be connected on the board. 

3.2.5.3 P%D!SABLE_OUT_L 

When P%DISABLE_OUTJL is asserted, NVAX does not drive any of its Input/Output or Output 
pins, including the NDAL clock outputs (P%PHI12_OUT_H, P%PKE23_OUT_H, P%PHI34_ 
OUT_H and P%PHI41_OUT_H). 

This functionality is used only during test. 

3.2.5.4 P%TEMP_H 

P%TEMP_H is an output pin to be used in test to determine when the NVAX CPU chip is at 
thermal equilibrium. The voltage on this pin will vary between VDD_I and VSS _I, depending on 
chip temperature, but the temperature to voltage transfer function will not be specified. 

As the chip heats up the voltage on the pin will fall, and once the chip is at thermal equiHbrium 
the voltage will remain at some value below VDD_I. This voltage will be monitored by the tester, 
and testing will commence only when the voltage stops changing, indicating that the chip is at 
thermal equilibrium. 

3.2.5.5 P%TMS__H 

P%TMS_H is the JTAG test mode select input. It is pulled high by an on-chip resistor when it 
is not being driven externally. 

3.2.5.6 P%TCK_H 

P%TCK_H is the JTAG test clock. It is pulled low by an on-chip resistor when it is not being 
driven externally. 

3.2.5.7 P%TDI_H 

P%TDI_H is the JTAG serial test data input. It is pulled high by an on-chip resistor when it is 
not being driven externally. 
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3.2.5.8 P%TDO_H 

P%TDO_H is the JTAG serial test data output. 

3.2.5.9 P%PP_CMD_H<2:0> 

P%PP_CMD JH<2:0> provides the NVAX parallel port a command indicating the current function 
of the parallel port. 

3.2.5.10 P%PP_DATA_H<11 :0> 

P%PPJDATA_H<11:0> are output pins for reading test data from NVAX. 



3-14 NVAX Chip Interface 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



3.3 TheNDAL 

The NDAL is a 64-bit limited length, pended, synchronous bus -with centralized arbitration. 
Several transactions can be in progress at a given time, allowing highly efficient use of bus 
bandwidth. Arbitration and data transfers occur simultaneously. The bus uses multiplexed data 
and address lines. The NDAL supports quadword, octaword and hexaword reads and writes to 
memory and I/O space. 

The NDAL supports up to four (4) nodes with a maximum of one (1) NVAX CPU. In this spec, these 
four nodes are referred to as CPU (NVAX), I01_NODE, I02_NODE, and the memory interface. 

Thirty nanoseconds is the minimum NDAL cycle time being considered for a binned CPU. 
Operating at 30ns, the NDAL has a raw bandwidth of 267 Mbytes/second. At 42ns, the NDAL 
has a raw bandwidth of 190 Mbytes/second. The usable bandwidth, which depends on transaction 
length, is shown in Table 3-3 and Table 3—4. 

Table 3-3: NVAX DAL Bandwidth at 30ns 



Operation Bandwidth 



Quadword Read 


133.0 Mbytes/sec 


Octaword Read 


178.0 Mbytes/sec 


Hexaword Read 


213.0 Mbytes/sec 


Quadword Write 


133.0 Mbytes/sec 


Octaword Write 


178.0 Mbytes/sec 


Hexaword Write 


213.0 Mbytes/sec 


Table 3-4: NVAX DAL Bandwidth at 42ns 


Operation 


Bandwidth 


Quadword Read 


95.0 Mbytes/sec 


Octaword Read 


127.0 Mbytes/sec 


Hexaword Read 


152.0 Mbytes/sec 


Quadword Write 


95.0 Mbytes/sec 


Octaword Write 


127.0 Mbytes/sec 


Hexaword Write 


152.0 Mbytes/Bee 



Table 3-5 details each NDAL signal. Where All is indicated for Drivers and Receivers, all four 
possible NDAL nodes drive or receive the signal. 
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Table 3-5: NDAL Signals 


Signal 


Type 1 


Drivers 


Receivers 


Function 


Arbitration signals 


P%CPUJREQ_L 


SS 


NVAX 


Arbiter 


NVAX requests the bus. 


I01_REQ_L 


ss 


IOl.NODE 


Arbiter 


IOl.NODE requests the bus. 


I02_REQ_L 


SS 


I02.NODE 


Arbiter 


I02_NODE requests the bus. 


P%CPU_HOLD_L 


ss 


NVAX 


Arbiter 


Extends P%CPU_GRANT_L. 


I01_HOLD_L 


ss 


IOl.NODE 


Arbiter 


Extends IOl.GRANT. 


I02_HOLD_L 


ss 


I02.NODE 


Arbiter 


Extends I02.GRANT. 


P%CPU_GRANTL 


ss 


Arbiter 


NVAX 


Grants NVAX the bus. 


IOl.GRANT.L 


ss 


Arbiter 


I01_NODE 


Grants IOl.NODE the bus. 


I02_GRANT_L 


ss 


Arbiter 


I02_NODE 


Grants I02.NODE the bus. 


P%CPU_SUPPRESS_L 


ss 






suppresses aii out wnieoaCKS ana. nns. 


P%CPU_WB_ONLY_L 


ss 


Arbiter 


NVAX 


Limits NVAX to doing only Disown Writes 
or NOPs. 


IOl.SUPPRESSJL 


ss 


IOl.NODE 


Arbiter 


Suppresses all but writebacks and fills. 


I01__WB_ONLYJL 


ss 


Arbiter 


I01_NODE 


10 1 NODE may only do Disown Writes 
and fills. 


I02_SUPPRESS_L 


ss 


I02.NODE 


Arbiter 


Suppresses all but writebacks and fills. 


I02_WB_ONLY_L 


ss 


Arbiter 


I02.NODE 


I02_NODE may only do Disown Writes 
and fills. 


Data, address, and command s 


ignals 






P%NDAL_H<63 K)> 


T 


All 


All 


Multiplexed data and address lines. 


P%CMD_H<3:0> 


T 


All 


All 


Command being performed this cycle. 


P%ED_H<2K» 


T 


All 


All 


Commander identification for the transac- 
tion. 


P%PARITY_H<2K)> 


T 


All 


All 


Parity for P%NDAL_H, P%CMD_H, and 
P%ID_H. 


P%ACKJL 


OD 


All 


All 


NDAL acknowledgement of receipt. 


Clock signals 


P%SYS_RESET_L 


ss 


NVAX 


All but NVAX Resets all nodes. 


pmi2_H 


ss 


NVAX 


All 


PHI12 clock for all bus residents. 


PHI23_H 


ss 


NVAX 


All 


PHI23 clock for all bus residents. 


PHI34_H 


SS 


NVAX 


All 


PHI34 clock for all bus residents. 


PHI41.H 


SS 


NVAX 


All 


PHI41 clock for all bus residents. 



1 Indicates whether the pin is Single Source (SS), Tristate (T), or Open Drain (OD) 
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3.3.1 Terms 

In order to clearly describe the transactions which occur on the NDAL, the following terms are 
used: 

• Node - A node is a hardware device that connects to the NDAL. The largest NDAL system 
configuration will support 4 nodes. 

• Transfer - A transfer is the smallest quantum of work that occurs on the NDAL. Typical 
examples of transfers are the address cycle of a read, the address cycle of a write, and each 
data cycle of a write. 

• Transaction - A transaction is composed of one or more transfers. Transaction is the name 
given to the logical task being performed (e.g., read); in the case of the read specifically, the 
transaction consists of a command transfer followed some time later by a return data transfer. 
See Commander, Responder, Transmitter, and Receiver below. 

• Commander - The commander is the node that initiated the transaction in progress. In 
any^ write transaction, the commander is the node that requested the write; for reads, the 
commander is the one who requested the data. The distinction of being the commander in a 
transaction holds for the duration of the transaction in spite of the fact that in some cases it 
might appear that the commander changes. A case in point is where the commander initiates 
a read transaction. It is the responder (data source) that initiates the return data transfer, 
but the node that requested the data is still the commander. 

• Responder - The responder is the complement to the commander in a transaction. 

• Transmitter - The transmitter during an NDAL cycle is the node that is driving the in- 
formation on the NDAL. Using the read transaction as an example, the commander is the 
transmitter during the command transfer; during the return data transfer the commander is 
the receiver. 

• Receiver - The receiver receives the data being moved during a transfer. 

• Naturally Aligned - Refers to a data quantity whose address could be specified as an offset, 
from the beginning of memory, of an integral number of data elements of the same size. The 
lower address bits of a piece of naturally aligned data are zero. 

• ETM - Error Transition Mode. The backup cache enters Error Transition Mode when an error 
occurs. While in ETM, the state of the backup cache is preserved as much as possible. It 
continues to service requests to blocks which it owns, since those contain the only valid copy 
of data in the system. ETM is described completely in Chapter 13. 

• Address cycle - The cycle during which the address of the transaction is transmitted on the 
NDAL. This is the first cycle of a read or write. 

• Data cycle - A cycle during which the NDAL transfers data. These include data cycles of a 
write and fill data cycles. 

• Read Data Return - This is the command used during a cycle in which a responder is returning 
read data to a commander. These cycles are also referred to as fills. 
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3.3.2 NDAL Clocking 

The NDAL is a four-phase bus. NVAX drives four two-phase overlapping docks to the other chips 
on the NDAL as well as back to itself, as shown in Table 3-6. 



Table 3-6: NDAL clocks 



NVAX output pin 


NDAL clock 


NVAX input pin 


P%PHH2_OUT_H 


PHI12_H 


P%PHH2_IN_H 


P%PHl23_OUT_H 


PHI23_H 


P%PHI23_IN_H 


P%PHl34_OUT_H 


PHI34_H 


P%PHI34_IN_H 


P%PH!41_OUT_H 


PHI41.H 


P%PHI41_IN_H 



See Chapter 17 for more details. 



3.3.3 NDAL Arbitration 

The NDAL protocol can architecturally support up to 4 nodes, which consist of one NVAX CPU 
and three interfaces to memory or I/O. This spec assumes one interface to memory and two 
interfaces to I/O. The I/O interfaces are referred to as I01_NODE and I02_NODE. The non-CPU 
nodes may or may not contain caches. 

At a given time, any or all of the nodes may desire the use of the NDAL. Arbitration cycles occur 
in parallel with data transfer cycles using a set of lines dedicated specifically for arbitration. 

Figure 3-3 shows the connection of the arbitration signals on the fully-configured NDAL. This 
arbitration scheme assumes that the arbiter is built into the memory interface. If the arbiter 
were built as a separate chip, the memory interface would need its own request, hold, grant, 
suppress, and wb_only lines. When the arbiter is built into the memory interface, the memory 
interface can withhold grant if its input queues are filling up. 
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Figure 3-3: 
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NDAL Arbitration Block Diagram 
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The following sections describe the NDAL arbitration signals. 

3.3.3.1 NDAL Arbitration Signals 

3.3.3.1 .1 P%CPU_REQ_L 

NVAX asserts P%CPU_REQ_L to request the NDAL for the following cycle. P%CPU_KEQ_L 
is a unidirectional signal from NVAX to the arbiter. 



3.3.3.1.2 !01J?EQJ. 

I01_NODE, an interface node, asserts IOIJREQJL when it wants to drive the NDAL. I01_REQ_L 
is a unidirectional signal from I01_NODE to the arbiter. 
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3.3.3.1.3 I02_REQ_L 

I02_N0DE, an interface node, asserts I02_REQ_L when it wants to drive the NDAL. I02_REQ_L 
is a unidirectional signal from I02_NODE to the arbiter. 

3.3.3.1 .4 P%CPU_HOLDL 

The NVAX CPU asserts P%CPU_HOLD_L in order to gain access to the NDAL for consecutive 
cycles. The NVAX CPU only asserts P%CPU_HOLDJL when P%CPU_GRANT_L is asserted; 
it never asserts P%CPUJHOLD_L unless P%CPU_GRANT_L is asserted. Assertion of 
P%CPU_HOLD_L guarantees that NVAX may retain ownership of the NDAL in the next cycle, 
independent of the value of any other outstanding requests. The arbiter must grant the bus to 
the CPU if the CPU asserts P%CPUJSOLD JL. 

P%CPU_HOLD_L is used for multicycle transfers, allowing NVAX to acquire consecutive cycles. 
NVAX asserts P%CPU_HOLD_L for hexaword Disown Write transactions, in order to transfer 
the four quadwords of data consecutively and directly after the address cycle; and for quadword 
Write or Disown Write transactions, in order to transfer the one quadword of data directly after 
the address cycle. NVAX never asserts P%CPU_HOLDJL for more than four contiguous cycles. 

3.3.3.1.5 I01_H0LD_L 

I01_HOLD_L is analogous to P%CPU_HOLD_L. It performs HOLD functionalitv for 
IOl.NODE. 

I01_NODE may not assert I01_HOLD_L unless I01_GRANT_L is asserted during the current 
NDAL cycle. Assertion of IOIJEOLDJL guarantees that I01_NODE may retain ownership of the 
NDAL in the next cycle, independent of the value of any other outstanding requests. The arbiter 
must grant the bus to I01_NODE if it asserts I01_HOLD_L. 

I01_HOLD_L signal is used for multicycle transfers, allowing I01_NODE to acquire consecutive 
cycles. In a hexaword write transaction, for instance, I01_NODE asserts I01_HOLD_L in order 
to transfer the four quadwords of data consecutively. I01_HOLDJL may also be used to transfer 
Fill data in consecutive cycles. I01_HOLD_L may be asserted for a maximum of four contiguous 
cycles. 

3.3.3.1.6 I02_H0LD_L 

I02JHOLDJL is analogous to IOl.HOLDJL. It performs HOLD functionality for I02_NODE. 

3.3.3.1 .7 P%CPU_SUPPRESSL 

NVAX asserts P%CPU_SUPPRESS_L in order to suppress new NDAL transactions which NVAX 
treats as cache coherency requests. It does this when its two-entry cache coherency queue (the 
NDAL_IN_QUEUE ) is in danger of overflowing. 

During the cycle when P%CPU_SUPPRESS_L is asserted, NVAX will accept a new transaction. 
NVAX requires transactions in the following cycle to be suppressed. 
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While P%CPU_SUPPRESSJL is asserted, only fills and writebacks are allowed to proceed 
from non-CPU nodes. The CPU may continue to put all transactions onto the bus (as long 
as P%CPU_WB_ONLY_L is not asserted). Because the NDAL_IN_QUEUE is full and takes the 
highest priority within the Cbox, NVAX is mostly working on cache coherency transactions while 
P%CPU_SUPPRJESS_L is asserted, which may cause NVAX to issue WDISOWNs on the NDAL. 
However, NVAX may and does issue any type of transaction while P%CPU_SUPPRESSJL is 
asserted. 

3.3.3.1.8 I01_SUPPRESS_L 

I01_NODE can suppress new transactions on the NDAL by asserting IOIJSUPPRESSJL. Fills 
and writebacks will proceed as usual. 

3.3.3.1 .9 I02_SUPPRESS J_ 

I02_NODE can suppress new transactions on the NDAL by asserting I02_SUPPRESS_L. Fills 
and writebacks will proceed as usual. 

3.3.3.1 .1 0 P%CPU_GRANT_L 

P^cCPU_GRANT_L is asserted to notify NVAX that it must drive the NDAL during the following 
cycle. When P9cCPU_GRANT_L is asserted, NVAX must drive the bus with a valid command 
and correct parity. If NVAX did not request the NDAL, it drives the bus with a NOP. It only 
drives a non-NOP command if it actually requested the NDAL in the previous cycle. 

If NVAX asserts P%CPU_HOLDJL, P%CPU_GRANT._L must be asserted in the next cycle. 

3.3.3.1.11 I01_GRANT_L 

The arbiter asserts I01_GRANT_L when I01_NODE is permitted to drive the bus. When 
I01_GRANT_L is asserted, I01_NODE must drive the bus with a valid command and correct 
parity. If I01_HOLD_L is asserted, 1 0 1_GRANT_L must be asserted in the next cycle. 

3.3.3.1.12 I02__GRANT_L 

I02_GRANT_L is analogous to I01_GRANT_L. It grants the bus to I02_NODE. 

3.3.3.1 .1 3 P%CPU_WB_ONLY_L 

When P%CPU_WB_ONLY_L is asserted, NVAX will only issue Write Disown or NOP commands, I 
including Write Disowns due to Write Unlocks when the cache is off or in ETM. Otherwise, NVAX 
will not issue any new requests. During the cycle in which P%CPU_WB_ONLY_L is asserted, 
the system must be prepared to accept one more non-writeback command from the CPU. Starting 
with the cycle following the assertion of P%CPU_WB_ONLY_L, NVAX will only issue writeback 
commands. 
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3.3.3.1.14 l01_WB__ONLY_L 

IO!_WB_ONLY_L is driven by the arbiter and received by I01_NODE. When I01_WB_ONLY_L 
is asserted, IOIJNFODE only arbs for the bus in order to return fills or disown writes. It does not 
initiate any new transactions. 

3.3.3.1 .1 5 I02_WB_0NLY J. 

I02_WB_ONLY_L is driven by the arbiter and received by I02_NODE. When I02_WB_ONLY_L 
is asserted, I02_NODE only arbs for the bus in order to return fills or disown writes. It does not 
initiate any new transactions. 

3.3.3.2 NDAL Arbitration Timing 

The timing for NDAL arbitration is shown in Figure 3-4. There are several critical spots to note 
in the diagram. The arbiter receives the request lines by the end of PI. It must drive the grant 
lines to valid values by the end of P3. It has two phases to calculate arbitration and to drive the 
grant lines across the board. 

In the fastest system (10ns NVAX), the arbiter has 15ns after receiving the request lines to 
arbitrate and to drive the grant lines. Board simulations for one system show that driving the 
grant line* will take about half that time. 

From the time a bus driver receives its grant line, it has three phases to drive P%NDAL_H<63:0>, 
P%CMD_H<3:0>, P%ID_H<2:0>, and P%PABITY_H<2 :0> to valid levels. 

From the time the NDAL is valid on its pins, the receiver has four phases to compute parity and 
to assert P%ACKJL. 
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3.3.3.3 NDAL Suppress and Its Timing 

When any node asserts its suppress line, no transactions other than writebacks or fills must be 
driven onto the bus, starting in the following cycle. For example, when P%CPU_SUPPRESSJL 
is asserted, the arbiter can accomplish this in the following way: if P%CPU_SUPPKESS_L is 
asserted during cycle 0, the arbiter does not grant the bus to any node, with the possible exception 
of the CPU, in cycle 0. At the same time it asserts I01_WB_ONLY and I02JWB_ONLY. In cycle 
1, the arbiter continues to perform bus arbitration as it normally would, but now I01_NODE and 
I02_NODE recognize the assertion of their respective WB_ONLY lines, and they do not request 
the bus except for fills and writebacks. 

From this, it may be seen that the assertion of P%CPU_SUPPRESS_L causes the arbiter 
to assert I01_WB_ONLY_L and I02_WB_ONLY_L; the assertion of I01J3UPPRESS_L causes 
the arbiter to assert P%CPU_WB_ONLY_L and 1 02_WB_ONLY_L; and the assertion of 
I02_SUPPRESS_L causes the arbiter to assert P%CPU_WB_ONLY_L and I01_WB_ONLY_L. 

The timing for suppression of the bus is shown in Figure 3—5. In this example, the CPU suppresses 
the bus by asserting P%CPU_SUPPRESS_L, which is valid at the end of PI in NDAL cycle 0. 
The arbiter immediately asserts I01_WB_ONLY_L an( j I02_WB_ONLY_L, which are valid by 
the end of P3 in the same cycle. This notifies I01_NODE and I02.NODE that they should not 
arbitrate for the bus for new transactions, only for writebacks and fills. (If the 10 chip cannot 
suppress its request line quickly enough, it may drive NOPs onto the NDAL if it gets GRANT, 
instead of withdrawing its request in the first cycle.) Accordingly, in NDAL CYCLE 1 as shown 
in the diagram, I01_REQ_L is deasserted by I01_NODE, since it has a read or a write request 
to do. I02_REQ_L remains asserted because I02_NODE has a fill to do. 

During the cycle in which P%CPU - SUPPKESS._L is asserted, the arbiter does not grant to any 
node with the exception of the CPU. Since it is the one suppressing the bus it should be allowed 
to continue issuing transactions on the bus. 

If a node had its HOLD line asserted and it had been granted the bus in the cycle before, it 
WOULD get grant under suppress. The rules for HOLD override the rules for SUPPRESS. 

In NDAL CYCLE 1, the bus is granted to I02_NODE which has arb'd to do its fill. The fill is 
driven in NDAL cycle 2. 

3.3.3.4 NDAL Arbitration Rules 

The rules of arbitration are as follows: 

1. Any node may assert its request line during any cycle. 

2. A node's grant line must be asserted before that node drives the NDAL. 

3. An NDAL driver may only assert its HOLD_L line if it has been granted the bus for the 
current cycle. 

4. If a node has been granted the bus, and it asserts HOLD, it is guaranteed to be granted the 
bus in the following cycle. 

5. HOLD may only be used in two cases: (a) to hold the bus for the data cycles of a write; (b) to 
send consecutive fill cycles. 
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6. HOLD must be used to retain the bus for the data cycles of a write, as the data cycles must 
be contiguous with the write address cycle. 

7. HOLD must not be used to retain the bus for new transactions, as arbitration fairness would 
not be maintained. 

8. If a node requests the bus and is granted the bus, it must drive the NDAL during the granted 
cycle with a valid command. NOP is a valid command. NVAX takes this a step further and 
drives NOP if it is granted the bus when it did not request it. 

9. Any node which issues a read must be able to accept the corresponding fills as they cannot 
be suppressed or slowed. 

10. If a node's WB.ONLY line is asserted, it may only drive the NDAL with NOP, RDE, RDRn, 
WDISOWN, WD ATA, or BADWDATA. 

11. If a node asserts its SUPPRESS line, the arbiter must not grant the bus to any node 
except that one in the next cycle. At the same time the arbiter must assert the appropriate 
WB_ONLY lines. In the following cycle, the arbiter must grant the bus normally. 

12. The rules for HOLD override the rules for SUPPRESS. 

13. The bus must be actively driven during every cycle. 

Specifics on arbitration algorithms may be found in the system specs for each NVAX system. 
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3.3.4 NDAL Information Transfer 

3.3.4.1 P%NDAL_H<63:0> 

The use of this field is multiplexed between address and data information. On data cycles the 
lines represent 64 bits of read or write data; on address cycles the lines represent address, byte 
enable, and length information. 

There are four types of data cycles: Write Data, Bad Write Data, Read Data Return, and 
Read Data Error. During write data cycles the commander drives its Commander ID on 
P%ID_H<2.*0> and drives data on P%NDAL_H<63:0>. The full 64 bits of data are written 
during hexaword writes. For octaword and quadword length writes, the data bytes which are 
written correspond to the byte enable bits which were asserted during the address cycle which 
initiated the transaction. During Read Data Return and Read Data Error cycles the responder 
drives the original commander ID. 

The NDAL address cycle is used by a commander to initiate an NDAL transaction. On address 
cycles the address is driven in the lower longword of the bus, and the byte enable and transaction 
length are in the upper longword, as shown in Figure 3-6. 

Figure 3-6: Address Cycle Format 
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Each field shown in the diagram is described in the sections which follow. 

3.3.4.1.1 Address Field 

The address space supported by the NDAL is divided into memory space and I/O space. 

The lower 32 bits of the address cycle P%NDAL_H<31:0> define the address of an NDAL read 
or write transaction. The NDAL supports a 4 Gigabyte (2**32 byte) address space. The most 
significant bits of this address (corresponding to lines P%NDALJ9<31:29>) select 512 Mb I/O 
space (P%NDAL_H<31:29> = 111) or 3.5 Gb memory space (P%ND AL_H<3 1 :29> = 000..110). 

Figure 3-7 illustrates the division of the address space into memory space and I/O space. 

The division of the NDAL address space in the I/O region is further defined to accommodate the 
need for NDAL node and I/O node address space. More information about the division of I/O 
space may be found in Chapter 2. 
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Figure 3-7: Physical Address Space Layout 
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Address bits <31:0> are all significant bits in an address to I/O space. Although the length field 
on the NDAL is always quadword for I/O space reads and writes, the actual amount of data 
read or written may be less than a quadword. The byte enable is used to read or write the 
requested bytes only. If the byte enable indicates a 1-byte read or write, every bit of the address 
is significant. The lower bits of the address are provided so that the I/O adapters do not have to 
deduce the address from the byte enable. 

The number of significant bits in an address to MEMORY depends on the transaction type and 
length as shown in Figure 3-8. 
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NDAL Memory Address Interpretation 



A<i>, i- A 3 2 1 0 
+-+-+-+-+-+ 
hexaword | s I s I d | d I d I 

+-+-+-+-+-+ 

|s|s|d|d|d| 



Isldldfdldl 
+-+-+-+-• f-+ 

|d|d|d|d|d| 
+-+-+-+-+-+ 

s ■ significant 
d « don't care 



It can be seen from the figure that bits A<4:3> are significant address bits or don't care, depending 
on the function being requested. 

All reads have significant bits down to the quadword. Although fills may be returned in any order, 
there is a performance advantage if memory returns the requested quadword first. The NDAL 
protocol identifies each quadword using one of the four Read Data Return commands, so that 
quadwords can be placed in correct locations regardless of the order in which they are returned. 

Quadword, octaword and hexaword writes are always naturally aligned and driven on the NDAL 
in order from the lowest-addressed quadword to the highest. 

3.3.4.1.2 Byte Enable Field 

The Byte Enable field is located in P%NDAL_H<55:40> during the address cycle. It is used to 
supply byte-level enable information for quadword-length OREADs, IREADs, DREADs, WRITEs, 
and WDISOWNs and octaword-length WRITEs and WDISOWNs. Of these transactions, NVAX 
generates only quadword IREADs and DREADs to I/O space, quadword WRITEs to I/O space, 
and quadword WRITEs and WDISOWNs to memory space. 

If the byte enable is a "1", the byte is to be read or written. If it is a "0", the byte is not read or 
written. 

NOTE 

During quadword-length transactions the high portion of the byte enable field, located 
in P%NDAL_H<55:48>, is ignored. Commanders may drive any data pattern they 
wish in this field as long as it has correct parity. Responders must not depend on a 
certain defined pattern (such as all zeros). 

During hexaword-length transactions the entire byte enable field is ignored. During 
hexaword transactions, commanders are permitted to drive any data pattern they wish 
in this field as long as it has correct parity. Responders must not depend on a certain 
defined pattern (such as all zeros). 

During octaword-length transactions, the byte enable located in P%NDAL_H<47:40> 
always corresponds to the low-order quadword of the octaword. The byte enable 
located in P%NDAL_H<55:48> always corresponds to the high-order quadword of the 
octaword. 

The correspondence between bits in the enable and bytes of the data is shown in Table 3—7 and 
Table 3-8. 



Read quadword, octaword, 
Write quadword 
Write octaword 
Write hexaword 
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Table 3-7: Byte Enable for Quadword Reads and Writes 
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P%NDAL_H<42> 


P%NDAL_H<23:16> 




P%NDAL_H<41> 
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P%NDAL_H<40> 
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Table 3-8: Byte Enable for Octaword Writes 
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First data cycle 


Second data cycle 


Byte Enable Bit 


Quadword 0 Data Byte 


Quadword 1 Data Byte 
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P%NDAL_H<07KM)> 
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Table 3—9 illustrates possible bit patterns in the byte enable for transactions which NVAX 
generates. Only transactions in which the byte enable is valid are listed. 

NVAX will generate every possible byte enable for every possible address for quadword WRITEs 
and WDISOWNs to memory space, as shown by the table. IREADs to I/O space will always 
request a full quadword of data by asserting all the byte enable bits. 

DREADs and WRITEs to I/O space are issued using the quadword length NDAL encoding, but 
the requests are for only a byte, word, or longword at a time, as indicated by the byte enable 
given in the command cycle of a transaction. References that are unaligned across a naturally 
aligned quadword are decomposed into two separate requests for the bytes in each quadword; 
where this is the case, Table 3-9 shows the byte enable values for both references generated. In 
the cases where a second request is generated, the address is incremented by 8, which addresses 
the next quadword in I/O space, but address bits <2:0> are OOO(BIN). 

When the NVAX CPU does an I/O space read for an interrupt acknowledge (IAK read), it always 
generates a longword-aligned word-length read request. In other words, the byte enable which 
NVAX uses for an IAK read is either 0000 0011 (binary) or 0011 0000 (binary). 

Table 3-9 reflects what NDAL requests the NVAX CPU will generate, depending on the software 
written. Software must take care only to generate requests which make sense in the system 
environment. Specifically, unaligned requests are forbidden by DEC Standard 032. | 
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Table 3-9: Possible Byte Enables for NVAX-generated transactions 
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IqJVY VV JCV1 LXLi 




o_j 


100 








nil oonn 

XXXX WW) 


(I/O space) 


















001 


1st 


001 


0000 0010 


0000 0110 


0001 1110 


0001 1110 






O iTV 4 


101 








mo oooo 

XXXV WW 








Wv 








0000 0001 
vwv wvx 




010 


1st 


010 


0000 0100 


0000 1100 


0011 1100 


0011 1100 








no 

XXV 








11 00 OOOO 

XXW WW 






3rd 


000 








0000 0011 




mi 


let 
XoL 


mi 

uxx 


oooo iooo 

WW XWV 


0001 1000 


0111 1000 


0111 10O0 
vxxx XVW 








m 

XXX 








1 000 OOOO 

XWV WW 






3rd 


000 








0000 0111 




ioo 


1st 

lb t 


100 


onm oooo 

v WX WW 


0011 0000 


1111 0000 


1111 0000 

XXXX WW 






9t,J 










0000 1111 

VWV 111 J 




101 


1st 


101 


0010 0000 


0110 0000 


1110 0000 


1110 0000 








000 






0000 0001 


0000 0001 






3rd 


001 








oooi mo 




110 


1st 


110 


0100 0000 


1100 0000 


1100 0000 


1100 0000 






2nd 


000 






0000 0011 


0000 0011 






3rd 


010 








00111100 




111 


1st 


111 


1000 0000 


1000 0000 


1000 0000 


1000 0000 






2nd 


000 




0000 0001 


0000 0111 


0000 0111 






3rd 


011 








0111 1000 
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3.3.4.1.2.1 I/O space writes 

When the NVAX CPU issues an I/O space write, it always replicates the data identically on the 
high longword and the low longword of the NDAL, although the byte enable indicates that the 
data is only valid in one longword or the other. A system device may take advantage of this fact 
to avoid rotating the data. 

3.3.4.1.3 Length Field 

The length field is used to indicate the amount of data to be read or written for the current 
transaction. Table 3—10 shows how the length values correspond to transaction lengths. 

Table 3-10: NDAL Length Held 



P%ND AL_H<63 :62> length 



00 


hexaword 


01 


unused 


10 


quadword 


11 


octaword (not used by NVAX CPU) 



3.3.4.2 P%CMD_H<3:0> 

The P%CMD_H<3:0> lines specify the current bus transaction during any given cycle. The 
interpretation of the four bits is shown in Table 3-11. 
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Table 3-11 : NDAL Command Encodings and Definitions 



Levels 


Abbrev. 


Bus Transaction 


Type 


Function 


nnnn 


MOP 


No Operation 


Nop 


No Operation 


0001 


Reserved 








0010 


WRITE 


Write 


Addr 


Write to memory with byte enable if 
quadword or octaword 


ftfin 


wr>TQr*w\r 

W ULOKJ W IN 


Write Disown 


&.AA-- 

AaQX 


Write memory; cache disowns block 
and returns ownership to memory 


\JtVJ\J 




Instruction Stream Read 


AXXQX 


Instruction-stream read 




■L/Xvrin 1 / 


Data Stream Read 


AAAt 


Data-stream read (without ownership) 


0110 


OREAD 


D- Stream Read Ownership 


Addr 


Data-stream read claiming ownership 
for the cache 


0111 


Reserved 








1000 


Reserved 








1001 


RDE 


Read Data Error 


Data 


Used instead of Read Data Return in 
the case of an error. 


1010 


WD ATA 


Write Data Cycle 


Data 


Write data is being transferred 


1011 


BAD WD ATA 


Bad Write Data 


Data 


Write data with errors is being 
transferred 


1100 


RDR0 


Read DataO Return (fill) 


Data 


Read data is returning corresponding 
to QW 0 of a hexaword. 


1101 


RDR1 


Read Datal Return (fill) 


Data 


Read data is returning corresponding 
to QW 1 of a hexaword. 


1110 


RDR2 


Read Data2 Return (fill) 


Data 


Read data is returning corresponding 
to QW 2 of a hezaword. 


1111 


RDR3 


Read Data3 Return (fill) 


Data 


Read data is returning corresponding 
to QW 3 of a hexaword. 



The NVAX CPU does not implement all transaction lengths with all commands. The commands 
and lengths which it uses are in the table which follows. If NVAX implements the command in 
memory space, MEM is indicated in the table; if it implements the command in I/O space, I/O is 
indicated in the table. 



Table 3-12: 


NDAL Address Cycle Commands as used by the NVAX CPU 


COMMAND 


QUADWORD OCTAWORD 


HEXAWORD 


DREAD 


I/O — 


MEM 


DREAD 


I/O — 


MEM 


OREAD 




MEM 


WRITE 


MEM 1 J/O — 




WDISOWN 


MEM 1 — 


MEM 



1 NVAX uses these transactions only when the backup cache is disabled or in Error Transition Mode. 
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When the cache is off, the NVAX CPU issues OREAD commands of hexaword length, 
and corresponding Disown Write commands of quadword length. These correspond to the 
CPU-internal commands of Read Lock and Write Unlock. The lock/ownership granularity in 
memory must not be less than a hexaword. Otherwise, when the CPU did a hexaword OREAD 
followed by a quadword Disown Write, the other three quadwords would be in limbo. The CPU 
would assume that it didn't own them, and memory would believe that they were still owned by 
the cache. 

3.3.4.3 P%ID_H<2:0> 

During the address cycle and return data cycles, P%H)__H<2 :0> contain the commander's ID. This 
ID is used to identify the source of the request on the address cycle and to associate returning 
data with the commander who issued the request on return data cycles. 

The commander ID codes available for use by a node are shown in Table 3—13. P%ID..H<2:1> 
indicate which node originated the transaction, and P%ED_H<0> indicates which of two 
outstanding reads per node. 

Table 3-13: Commander P%ID_H Assignments 

Node Name P%H)_H<2:0> 

NVAX 00X 

memory interface 01X 

IOl.NODE 10X 

I02_NODE 11X 



During write command and data cycles, P%ID_H<2:0> is driven with the ID of the commander. 
P%ID_H<2:1> is driven with the bits identifying the commander, and P%ID_H<0> may be driven 
with any value. P%ED_H<0> is not necessarily driven with the same value during the command 
cycle of a write and the corresponding data cycles of that write. 

Each commander node on the NDAL may have two read transactions outstanding. 

The memory interface is not a commander node, but it has been assigned a commander ID which 
may be used in some NVAX systems. For example, in the XMI2 system, the memory interface 
reflects XMI2 read and write commands into the NDAL for cache coherency reasons. These reads 
and writes are not taken up by any node on the NDAL except to enforce cache coherency. The 
memory interface uses its own ID when driving these reads and writes onto the NDAL. If a write 
is reflected onto the NDAL merely to enforce cache coherency, the WDATA cycles may be omitted. 

3.3.4.4 P%PARITYJH<2:0> 

P%PARITYJE<2> is computed over P%CMDJB<3:0> and P%ID_H<2:0>. Even parity is used, 
where the "exclusive OR" of all bits including the parity bit is a "0". (All bits, including the parity 
bit, have an even number of "l"'s.) 

P%PARITY_H<2 > is inverted, forcing an NDAL parity error, when 
CCTL<FORCE_NDAL_PERR> is set. This is described in Chapter 13. 



DIGITAL CONFIDENTIAL 



NVAX Chip Interface 3-35 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



P%PAKITY_H<1> is computed over the high longword of the NDAL, P%NDAL_H<63:32>. Odd 
parity is used, where the "exclusive OR" of all bits including the parity bit is a "1". (All bits, 
including the parity bit, have an odd number of "l*"s.) 

P%PARITY_H<0> is computed over the low longword of the NDAL, P%NDAL L H<31:0>. Even 
parity is used. 

Using a combination of odd and even parity means that neither all "l"'s nor all "0"'s is a legal 
bus pattern. 

If a device requests the bus and is granted it, but chooses not to use it during a given 
cycle, it is responsible for driving the NOP command on P%CMD_H<3:0>. It must drive 
P%NDAL_H<63:0>, P%ID_H<2:0>, and P%PAKETY_H<2:0> with correct parity 

If NVAX did not request the bus, and it is granted the NDAL anyway, it will drive the NDAL 
with a NOP. 

When the bus is idle, the arbiter ensures that the NDAL is driven with correct parity lb do this, 
the arbiter may take advantage of the fact that NVAX will drive the NDAL with NOP if it is 
unexpectedly granted the bus. 

The NVAX BIU checks the NDAL for correct parity in every cycle, regardless of the contents 
of the bus. It does not distinguish between errors on the command lines or the data lines; it 
computes the three parity bits, and if any fail, it responds to the error according to Table 3—21. 

Table 3-14: NDAL Parity Coverage 

Parity bit protected data parity type 

P%PAKITY_H<2> P%CMD_H<3K)>P%rD_H<2:0> even parity 
P%PAEITY_H<1> P%NDAL_H<63:32> odd parity 

P%PABITY_H<0> P%ND AL_H<3 1 :0> even parity 



3.3.4.5 P%ACK_L 

P%ACK_L is an open drain signal which is pulled high (deasserted) by an external resistor on 
the board. The resistor is able to pull the node high during the time allotted without assistance 
from any other P%ACKJL driver. Thus, an P%ACK_L driver only has to pull the signal low at 
the appropriate time. 

The receiver for a particular NDAL cycle is responsible for pulling P%ACKJL low (asserted) if it 
receives the cycle without parity errors. If another receiver detects a parity error on the cycle, it 
reports it by asserting P%HJ£RR_L or P%S_EKRJL. 

If P%ACKJL is asserted in response to an NDAL cycle, it indicates that the receiving node has 
accepted an address cycle or a data cycle. P%ACK_L being asserted for a read address cycle 
indicates that the responder will return a read response cycle at a later time. If it is asserted 
for a write address cycle, the transfer of the write address is assumed successful. If a cycle is 
accompanied by a NOP command, the cycle may or may not be acknowledged by the assertion of 
P%ACKJL; NOPs do not have to be acknowledged but they may be. 

P%ACK_L is always asserted by the NDAL receiver unless there was a parity error on the bus. 
It is NOT used for flow control. 
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P%ACKJL is also not asserted when there is no node on the NDAL which recognizes the 
address space addressed, i.e., transactions to non-existent memory and I/O space will not receive 
P%ACKJL assertion. 

See Table 3—21 for NVAX response when P%ACK_L is not asserted. 

The timing of ACK_L relative to the data or address cycle is shown in Figure 3—9. For a given 
transfer, ACK_L is asserted one cycle later. In cycle 0 a read is driven, so ACK_L is asserted in 
cycle 1. In cycle 4 a NOP is driven, and in cycle 5 ACKJL is not asserted because NOP's do not 
have to be acknowledged. 

Figure 3-9: P%ACK_L Timing 
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3.3.5 NDAL Transactions 

The following sections describe the entire set of NDAL transactions. 

Table 3—15 shows the entire set of NDAL commands and how they are used by NVAX. 

In memory space, NVAX issues all reads with hexaword length. Normal writes to memory space 
are always quadword length, and Disown Writes are quadword or hexaword. When the cache 
is operating normally, Disown Writes are only issued in hexaword length. When the cache is in 
ETM, NVAX issues Disown Writes of both hexaword and quadword length. When the cache is 
off, NVAX issues only quadword Disown Writes. NVAX issues quadword Disown Writes only as 
the result of an interlock operation. 

In I/O space, the ownership commands (OREAD and Disown Write) are not defined at all. NVAX 
issues only quadword operations in I/O space. NVAX never uses the BAD WD ATA command in 
I/O space. 
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Table 3-15: NDAL Command Usage by NVAX 



Address 
Space 


Command 


Used by 
NVAX 


Length 
QW 


Length 
OW 


Length 
HW 


XT/ A 

IN/A 


XT™. 

.Nop 


yes 








XT / A 

N/A 


Reserved 


no 








Memory 


WRITE 


yes 


yes 


no 


no 


Memory 


WDISOWN 


yes 


yes 


no 


yes 


Memory 


IREAD 


yes 


no 


no 


yes 


Memory 


DREAD 


yes 


no 


no 


yes 


Memory 


OREAD 


yes 


no 


no 


yes 


Memory 


RDE 


no 


- 


- 


- 


Memory 


YJC7T1 A T'A 
V\ DAI A 


yes 








Memory 


BAD \V DATA 


yes 








Memory 


rvDKO 


no 








Memory 


DTYDl 

xtDKl 


no 








Memory 


RDR2 


no 








Memory 


KDK3 


no 








I/O 


WRITE 


yes 


yes 


no 


no 


I/O 


WDISOWN 


no 


no 


no 


no 


I/O 


IREAD 


yes 


yes 


no 


no 


I/O 


DKKAD 


yes 


yes 


no 


no 


I/O 


OREAD 


no 


no 


no 


no 


I/O 


RDE 


no 








I/O 


WDATA 


yes 








I/O 


BADWDATA 


no 








I/O 


RDRO 


no 








I/O 


RDR1 


no 








I/O 


RDR2 


no 








I/O 


RDR3 


no 









DIGITAL CONFIDENTIAL 



NVAX Chip Interface 3-39 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Table 3—16 shows the usage of NDAL commands by NDAL devices other than NVAX. The 
ownership commands (OREAD and WDISOWN) are not denned at all in I/O space. Although 
nodes may use OREAD and WDISOWN of lengths other than hexaword, they must be aware 
of the memory coherency problems connected with using lengths other than hexaword for these 
operations. Memory defines ownership along hexaword boundaries. 

Table 3-16: NDAL Command Usage by NDAL nodes besides NVAX 

Used 

Address by NDAL 



Space 


Command 


nodes 


Length 


Length 


Length 








QW 


OW 


HW 


N/A 


Nop 


yes 


- 


- 


- 


IN /A. 


Reserved 


no 








Memory 


WRITE 


yes 


yes 


yes 


yes 


Memory 


WDISOWN 


yes 


yes 


yes 


yes 


Memory 


IREAD 


yes 


yes 


yes 


yes 


Memory 


DREAD 


yes 


yes 


yes 


yes 


Memory 


OREAD 


yes 


yes 


yes 


yes 


Memory 


RDE 


yes 


- 


- 


- 


Memory 


WD ATA 


yes 


- 


- 


- 


Memory 


BADWDATA 


yes 


- 


- 


- 


Memory 


RDRO 


yes 


- 


- 


- 


Memory 


RDR1 


yes 


- 


- 


- 


Memory 


RDR2 


yes 


- 


- 


- 


Memory 


RDR3 


yes 








I/O 


WRITE 


yes 


yes 


yes 


yes 


I/O 


WDISOWN 


no 


no 


no 


no 


I/O 


IREAD 


yes 


yes 


yes 


yes 


I/O 


DREAD 


yes 


yes 


yes 


yes 


I/O 


OREAD 


no 


no 


no 


no 


I/O 


RDE 


yes 








I/O 


WDATA 


yes 








I/O 


BADWDATA. 


yes 








I/O 


RDRO 


yes 








I/O 


RDR1 


yes 








I/O 


RDR2 


yes 








I/O 


RDR3 


yes 
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3.3.5.1 Reads and Fills 

The read address cycle, which is recognized by one of the three read commands (DREAD, IREAD, 
or OREAD) is decoded by the interfaces in the system, and the one which recognizes the address 
latches that address and command. This device is the responder. The responder uses Read Data 
Return or Read Data Error cycles to return the data. Reads and fills are described in the sections 
which follow. 



3.3.5.1.1 Dstream Read Requests (DREAD) 

An NDAL commander uses the DREAD command to request Data Stream data from a responder, 
either memory or an I/O device. 

3.3.5.1.2 (stream Read Requests (IREAD) 

The IREAD command is used to request Instruction-Stream data from a responder, either memory 
or an I/O device. 

The separate I-stream read command is used in implementing halt protection for the CPU. When 
a system device which asserts P%HALT_L recognizes an I-stream read in halt-protected space, 
it prevents P%HAI/T_L from being asserted to the CPU. In the meantime, DREADs outside of 
halt-protected space may occur. When an IREAD outside of halt-protected space happens, the 
system device resumes asserting P9eHALT_L to the CPU. 

When NV^X issues the IREAD command in I/O space, it expects a full quadword of data in 
return. The responding device may decode the IREAD command instead of the byte enable field 
to detect the need to return a full quadword of data- 
in addition, the separate IREAD command may be helpful in analysis during system debug or 
for performance analysis. 

3.3.5.1.3 Ownership Read Requests (OREAD) 

A node uses the OREAD command to gain ownership of a hexaword block of memory. Whereas 
previous systems implemented an Interlock read as well, the NDAL defines only the Ownership 
read. Interlocks can be accomplished using OREADs. 

OREADs are only denned for memory space; they are not used in I/O space. 

When memory receives an ownership read, an "owned" bit is set in memory and the read data 
is returned. Each hexaword in memory has an owned bit. The NVAX backup cache is organized 
by hexawords also, with an owned bit for each hexaword. Memory clears the owned bit when a 
Disown Write of any length is received to the same block. | 
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3.3.5.1.4 How memory handles reads to Owned blocks 

If the ownership bit is already set in memory when the OREAD arrives, data is not returned 
immediately to the commander. Once the node which owns the data Disown Writes the block, the 
Ownership bit is set in memory and the data is returned to the commander. The met that the 
ownership bit was set at the beginning of the reference is transparent to the commander on the 
NDAL. Once an OREAD is issued on the NDAL, the data must be returned to the commander 
without requiring any retry sequence. 

The analogous statement is true for an IREAD or a DREAD: If the ownership bit is already 
set in memory when the IREAD or DREAD arrives, data is not returned immediately to the 
commander. Once the node which owns the data Disown Writes the block, the the data is returned 
to the commander. The fact that the ownership bit was set at the beginning of the reference is 
transparent to the commander on the NDAL. Once an IREAD or DREAD is issued on the NDAL, 
the data must be returned to the commander without requiring any retry sequence. 

In certain error-handling situations, NVAX itself may issue a read to a block which it already 
owns. In this case the memory controller should handle the read as it normally would: wait until 
NVAX completes the WDISOWN, then return the read data to NVAX and set the ownership bit 
if the read was an OREAD. 

3.3.5.1.5 Read cycle description and timing 

A read command cycle consists of a commander driving an address cycle on P%NDAL_H<63:0>, 
as shown in Section 3.3.4. The commander drives P9fcCMD_H<3:0> with DREAD, IREAD, or 
OREAD. It drives its own identification code on P%ID H<2:0>, and it drives correct parity on 
P%PARITY_H<2:0>. 

The timing for a read cycle is shown in Figure 3-10. In this example, NVAX is doing a read. In 
Cycle 0, NVAX asserts P%CPU_REQ_L to request the NDAL. It is granted the bus immediately, 
as shown by the assertion of P%CPU_GRANT_L in cycle 0. (This example assumes that no 
other device was requesting the NDAL during this cycle.) 

The assertion of P%CPU_GRANT_L in phase 3 of cycle 0 means that NVAX is obligated to drive 
the NDAL in phase 1 of Cycle 1. It drives the read address out at that time. In this example, it 
deasserts its request line at the same time as it has no other requests to make. (It is not obligated 
to deassert request if it does have other requests to make.) 

The device receiving the read recognizes it in phase 3 of cycle 1, and computes parity across the 
data it received. In this example, it recognizes no parity error, and asserts P%ACK_L so that 
it is valid in phase 3 of cycle 2. The CPU receives P%ACKJL and knows that the read address 
cycle completed successfully. 

If there had been a parity error and P%ACK_L had not been asserted, NVAX would have 
responded with an error condition as described in Section 3.3.10.3. 
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3.3.5.1.6 Read Data Return cycles (RDRO, RDR1, RDR2, RDR3) 

The Read Data Return command is used in response to any read request, whether IRE AD, 
DREAD, or OREAD. Multiple cycles are necessary to transfer all of the quadwords in a given 
hexaword transaction, and the cycles are not required to be consecutive. The commander, which 
has been monitoring the bus traffic waiting for its return data, latches the information. The 
responder returns the commander ID with the returned read data so the commander can recognize 
the returned read data it requested. 

For a hexaword read, the four nil quadwords may be returned in any order. The NDAL Read 
Data Return command identifies the location of each quadword within the natural boundary as 
it is returned so that it can be placed in the correct location regardless of the return order. The 
data which is returned is naturally aligned within each quadword. 

In I/O space, only one cycle's worth of data is returned. The actual amount of valid data 
returned depends upon the byte enable which was issued with the read request, as described 
in Section 3.3.4.1.2. The Read Data Return command corresponding to the requested I/O space 
address is used in returning the data. 

Read Data Return cycles do not have to occur in adjacent cycles. The requested quadword should 
be returned as soon as possible, for performance reasons, even if the remaining quadwords are 
not yet available. The remaining quadwords may be sent as they become available. 

Because the NDAL is a pended bus, multiple reads may be outstanding at a time. Because Read 
Data Return cycles do not have to occur contiguously, it is possible for Read Data Return cycles 
resulting from different read requests to take place in an interleaved fashion. 

Table 3—17 shows the correspondence between address bits <4:3> and the RDR command used in 
returning data at that address. (Bits <4:3> indicate the alignment of a quadword of data within 
a hexaword.) The RDR command must correspond to the address of the data being returned for 
transactions of all lengths, whether quadword, octaword or hexaword. The correct RDR command 
must be used for both memory space and I/O space. 

Table 3-17: RDR usage for ALL fill cycles 



Address 

bits <4&> Command used for fill cycle 

00 RDRO 

01 RDR1 

10 RDR2 

11 RDR3 



3.3.5.1.7 Read data error cycles (RDE) 

RDE is used to notify a commander of a problem with read data which is being returned. For 
example, the memory interface may use this command when it encounters an uncorrectable read 
error. 
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Once a Read Data Error cycle is sent for a particular read, no further read responses may be 
sent for that transaction. The following sequence illustrates the series of events during a return 
data of hexaword length containing an uncorrectable read error. In this example, HOLD is used 
to return the data in consecutive cycles. 



Figure 3-11 : RDE example 



0 1 2 3 4 5 



Arb I resp I HOLD i HOLD | | 

CMD_H | | RDRO I RDR.1 1 RDE | 

KDAL_H | | data | data | | 

ID_H I | ciadx | cmdr | cmdr | 

ACK_I I | | ACK | ACK |ACK 



3.3.5.1.8 Read data cycle description and timing 

During a read data cycle, P%CMD_H<3:0> is driven to the value representing BDRO, RDR1, 
RDR2, RDR3, or RDE. P%NDAL_H<63:0> is driven with the quadword of read data being 
returned. P%ID_H<2:0> is driven with the ID which was issued with the original read request. 
Correct parity is driven on P9ePABITY_H<2:0>. 

The timing for a Read Data Return cycle is shown in Figure 3-12. In this example, I01_NODE 
has fill data to return. In cycle 0, 101_REQ_L is asserted to request the bus, and I01_GRANT_L 
is asserted in response. Since I01_GRANT_L was asserted in cycle 0, 101_NODE is obligated to 
drive the NDAL in cycle 1. It does so and returns the fill data. The original requestor of the data 
receives the data at the beginning of phase 3 of cycle 1, and since it detects no parity errors, it 
asserts P%ACKJL so that it is valid in phi3 of cycle 2, as shown. 

3.3.5.1.9 Read Transaction Examples 

3.3.5.1.9.1 Quadword Read and Fill 

A quadword read consists of a command transfer followed by a return data transfer as shown 
below: 
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Figure 3-13: Quad word Read and Fill 



Arb 
CMD_H 
NDAX_H 
ID_H 
ACK L 



0 

cmdr | 



I read I 
I addr | 
I cmdr | 



ACK 



4 

resp | 



I RDR2 | 
I data | 
I cmdr | 



ACK 



The two transfers are the read command and the Read Data Return. The CPU commander 
arbitrates for the NDAL in cycle 0, and wins. In cycle 1 it drives the command and address of 
the read, and its own ID (for use later to identify the returning data). In cycle 2 the receiver for 
that cycle asserts P%ACK_L if no parity error was detected on the bus. 

Sometime later (call it cycle 4) the return data transfer begins with the responder arbitrating for 
the NDAL. Having won it, in cycle 5 it drives the command, the data, and the commander's ID. 
The status of the returning data is specified in the read response code: either Read Data Return 
or Read Data Error. In this example, the quadword requested was to quadword 2 of a hexaword, 
so the RDR2 command is used in returning the data. 

The commander monitors the NDAL and checks for an ID match during Read Data Return cycles. 
An ID match indicates that the read data is meant for that commander. In cycle 6, the commander 
asserts P%ACK_L if it detected no parity error during the previous NDAL cycle. 

3.3.5.1 .9.2 Multiple Quadword Reads 

The only type of multiple quadword read which is used by NVAX is the hexaword read. Octaword 
reads are also supported by the NDAL protocol but are not issued by the NVAX CPU. These 
read transactions move multiple quadwords of data from the responder to the commander. The 
command transfer of the transaction is shown below. 

Figure 3-14: Read command on the NDAL 



Arb 
CMD_H 
NDAL_H 
ID_H~ 
ACK L 



cmdr | | 
I read I 
I addr | 
I cmdr I 



ACK 



The following sequence illustrates the response to a hexaword read. In this example, quadword 
1 of the hexaword was the requested quadword, so Read Data Return 1 is the command 
accompanying the first data to return. The requested quadword is returned first for performance 
reasons, although that is not required by NVAX or the NDAL. 
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01234567 



Art' | resp I | resp I I resp | | resp I | 

CMD_H | |RDR1| IRDROI |RDR3| |RDR2| 

NDAZ_K | | data | I data I I data I |data| 

IDJJ | | cmdr I | cmdr I I cmdr I I cmdr I 

ACK_L I | IACK | IACK | |ACK | |ACK 



The transfer above moves four quadwords of data. The command field of the NDAL in cycle 1, 
3, 5, and 7 says Read Data Return with the P%ID_H field identifying the intended receiver (the 
transaction commander). Each cycle provides a new quadword of read data and the P%ID JB 
remains unchanged. 

The example shows no transactions interleaved with the Read Data Return cycles, but it is 
entirely possible for non-related transfers to be taking place in the cycles between the fill cycles 
for one read. 

Read data may be returned in continuous cycles, if desired, through the use of the hold arbitration 
signals (see example below). The transmitter asserts its hold line in the first cycle to ensure that 
it maintains use of the NDAL long enough to complete the transfer. The hold lines are the highest 
priority arbitration lines and thus guarantee access. An interface is constrained to a maximum 
of four consecutive cycles in which it can assert its hold line. 

Figure 3-16: Read data return using HOLD 



0 12 3 4 



Art 1 resp Ihcid ihold Ihold I I 

CMD_H I | RDR2 IRDR3 | RDRO I RDR1 | 

NBAI,_H ! | data I data I data i data I 

ID_H | | cmdr I cmdr I cmdr I cmdr I 

ACK I I | IACK IACK IACK IACK 
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3.3.5.2 Writes 

3.3.5.2.1 Normal Write Transactions (WRITE) 

These transactions are used to move a pattern of bytes from an NDAL commander to one of 
the responders. The byte enable functionality is only used for quadword and octaword length 
transactions. In any hexaword write, all bytes are written regardless of the byte enable values. 

Parity must be correct for all bytes sent from any node, as NVAX checks parity across the entire 
NDAL during every cycle. 

If NVAX sees a write on the NDAL, it treats it as an invalidate request. A block invalidate is 
done if it is valid in the cache. A writeback is done if the block is owned. 

3.3.5.2.2 Disown Write Transactions (WDISOWN) 

The Disown Write transaction is the complement to the Ownership Read. After NVAX successfully 
gains ownership of a block in memory, it must relinquish ownership when another node wants 
ownership of the block or when the Bcache needs to do a deallocate. NVAX accomplishes this by 
performing a Disown Write to the memory with the latest copy of the data. The memory, which 
has been monitoring the bus traffic, notices that the transaction requested is a Disown Write. 
This condition allows it to clear the ownership bit in memory and to write the data as requested. 

NVAX uses the Disown Write command of hexaword length to perform writebacks from the backup 
cache. When the cache is off, it uses quadword Disown Writes to achieve the effect of a Write 
Unlock. 

3.3.5.2.3 Write Data and Bad Write Data (WDATA, B ADWD ATA) 

The Write Data command is used during the data cycles of a write if the data is good. If the data 
has been corrupted in some way, for instance, there were uncorrectable errors in a cache which 
was storing the data, the command used is Bad Write Data. 

When one quadword of a hexaword Write Disown is bad, the Bad Write Data command is only 
used for that quadword. The Write Data command is used for the good quadwords. The memory 
can use this information to distinguish which quadword of a hexaword block is bad. In addition, 
P%S_ERR_L may be asserted when the Bad Write Data command is used, to notify NVAX of 
the error. 

3.3.5.2.4 Write transaction description and timing 

In a Write transaction, a commander gains the NDAL and sends an address cycle. In this 
cycle, P%CMD JH<3:0> is driven to the value for WRITE. P%NDAL_H<63:0> is driven with the 
address, the transaction length, and byte enable. P%ED_H<2:1> is driven with the commander's 
identification code, and P%ED_H<0> is driven with any value. 

The commander immediately follows this cycle with one to four consecutive cycles of write data, 
depending on the length specified. In these cycles, P%CMDJEI<3:0> is driven with either the 
WDATA command or the BAD WD ATA command. P%NDAL_H<63:0> is driven with the write 
data. P%ID_H<2:1> is driven with the commander's identification code, and P%ID_H<0> must 
be driven, but may be driven with any value, as long as the parity is correct. 
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All interfaces on the NDAL decode the address, and the one that recognizes the address becomes 
the responder and asserts P%ACK_L. The responder accepts the command, address, and data 
and performs the requested write. 

For quadword and octaword length transactions to memory space, the byte enable field that 
accompanies each command and address is completely unrestricted. Each bit in the 16-bit byte 
enable field corresponds to a byte of data in the associated quadword or octaword. If the bit is 
0, that byte must not be written; if the bit is 1, that byte must be written. For hexaword write 
transactions, the responder ignores the byte enable and writes all 32 bytes. 

For I/O space transactions, the byte enable is used as indicated in Section 3.3.4.1.2. 

The timing for a quadword write on the NDAL is shown in Figure 3—17. In cycle 0, NVAX 
requests the bus for the write by asserting P%CPU_REQ_L. In this example, no higher priority 
request is pending, so NVAX is granted the bus right away, in cycle 0. NVAX then drives the 
write command and address in cycle 1, and asserts P%CPU_HOLD_L at the same time in order 
to retain the bus. In cycle 2 the write data is driven. 

Assuming there are no parity errors, P%ACK_L is asserted by the receiver in cycle 2. This is in 
response to the address cycle of cycle 1. In cycle 3, which is not shown, P%ACK_L is asserted 
for the data cycle, cycle 2. 

3.3.5.2.5 Write Transaction Examples 

3.3.5.2.5.1 Quadword Writes 

Quadword writes move some number of bytes from the commander to the responder as specified 
by the byte enable field. The commander arbitrates as usual and upon winning the NDAL, drives 
the appropriate write command, the intended address, the data byte enable, and its own ID and 
asserts its hold line to signal that it will need the next cycle also. In cycle two, it identifies the 
cycle as a Write Data Cycle and provides the write data. If an NDAL parity error is detected on 
cycle 1 or 2, it is signaled in cycle 2 or 3 by withholding the assertion of P%ACKJL. 

The cycle timing for a quadword write is shown in Figure 3—18. 
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0 2 2 3 4 



Arb |cmdr|HOLD| | 

CMD_K | |writ|wdat| 

NDAL_H | |addr|data| 

ID_H | | cmdr I cmdr | 

ACK_L | | IACK |ACK 



3.3.5.2.5.2 Multiple Quadword Writes 

The only multiple-data-cycle write issued by the N\^X CPU is the Hexaword Disown Write. 
Hexaword writes are similar to quadword writes except for the amount of data moved. The byte 
enable must be ignored in hexaword write transactions and all the bytes of the hexaword must 
be written. 

The first cycle of a hexaword write is identified with the length desired; successive cycles are 
identified as write data cycles. The hold line remains asserted, maintaining use of the NDAL for 
the commander. 

The four quadwords of data within the hexaword must be issued in order from lowest address to 
highest address. The order then is quadword 0, quadword 1, quadword 2, quadword 3. (Address 
bits <4:3> determine the position of a quadword within a hexaword.) Unlike fill data cycles, the 
same command, WDATA, is issued for every write data cycle, so the order in which the data is 
issued is essential so that it is written to the correct address in memory. 

A hexaword write is shown in Figure 3-19. 
Figure 3-19: Hexaword write on the NDAL 



0 1 2 3 4 5 



Arb | cmdr I hold | hold | hold I hold I [ 

CMD_H | Iwrt |wdattwdat|wdat|wdat| 

NDAL H | Iaddr|dat0|datl|dat2 |dat3| 

ID_H~ | I cmdr | I I I I 

ACE_L | I IACK |ACK |ACK |ACK |ACK 



NOTE 

The write data must always immediately follow the write address cycle with no NULL 
cycles in between. 

The NDAL protocol also allows for octaword writes. The NVAX CPU does not use these, but they 
may be used by other nodes. 

The two quadwords of data within the octaword must be issued in order from lowest address to 
highest address. The order then is quadword 0, quadword 1. (Address bit <3> determines the 
position of a quadword within an octaword.) 
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An octaword write is shown in Figure 3—20. 
Figure 3-20: Octaword write on the NDAL 



0 1 2 3 4 5 



Arb | cmdr I hold | hold I | 

CKD_H | I writ I wdat | wdat | 

NDA1_H I I addr | dat 0 1 datl | 

ID_K I I cmdr | | I 

ACK I I I I ACK |ACK |ACK 



3.3.5.3 NOPs 

For implementation reasons, occasionally NVAX will arbitrate for the NDAL and, if the bus is 
granted, it will drive a NOP. This only happens when NVAX has just driven out two back-to-back 
transactions. This happens rarely, and since NVAX has the lowest priority of the NDAL nodes, 
it is not a performance problem. 
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3.3.6 Cache Coherency 

Ownership Reads and Disown Writes on the NDAL are intended to support writeback caches by 
attaching an owner status to each block in physical memory. A block in memory is denned as 
a hexaword, or 32 bytes. A node which owns a block may write it repeatedly without accessing 
memory. Only one node owns a given block. Ownership is passed from memory to a non-memory 
node through an Ownership Read command. Ownership is passed from non-memory nodes to 
memory through a Disown Write command. 

The ownership bits in the caches and in memory indicate that a cache owns the block. The 
ownership bit in the writeback cache is set when the cache owns the block and is clear when the 
cache does not own the block. The ownership bit in memory is set when some cache owns the 
block and clear when memory owns the block. 

Shared read-only access to a block is permitted only when memory owns it. Otherwise the block 
can only be read by the node which owns the block. 

NVAX nodes with writeback caches can gain ownership and retain it for a very long time. NVAX 
monitors the bus continuously for memory space read-type and write commands to memory space 
by other nodes. When NVAX detects a request for a block that it owns, it will perform the disown 
write to memory, allowing the original command to complete successfully. 

Table 3—18 shows what action is performed in the backup cache based upon the state of the block 
in the cache when a particular command is received. 

Table 3-18: NVAX Backup Cache invalidates and Writebacks 

NDAL Command invalid block Valid & Unowned Valid & Owned 

IREAD, DREAD - - Writeback, set Bcache to 

valid-unowned state 

OREAD - Invalidate Writeback, Invalidate 

WRITE - Invalidate Writeback, Invalidate 

WDISOWN 



Some devices other than NVAX will access memory directly over the NDAL. As these commands 
go to memory, NVAX recognizes the command and performs the appropriate cache coherency 
action. NVAX does not acknowledge the commands as the memory interface is the receiver for 
the transaction. NVAX distinguishes cycles driven by devices other than itself by decoding the 
value driven on P%ID_H<2:0> for the cycle, and recognizes those as cache coherency transactions. 

In some systems, such as the XMI2 system, there is a system bus to which multiple NVAX CPUs 
are interfaced. In these systems, memory commands which occur on the system bus must be 
driven into the NDAL so that NVAX can respond to them as necessary with cache coherency 
actions. 

For example, if an OREAD happens on the XMI2, an OREAD must be driven onto the NDAL 
to trigger NVAX to write back the block if it owns it. However, there is no node on the NDAL 
which becomes a responder to a memory access transaction which is driven FROM the memory 
interface. The result is that P%ACK_L is not asserted to acknowledge such a transaction. This 
is not an error condition. 
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For more detail on the specific cache coherency requirements in the XMI2 system, refer to 
Section 3.4.1. 

3.3.7 Interrupts 

The P%IRQ_L<3 :0> lines provide a general-purpose interrupt request facility to interrupt the 
NVAX CPU. These lines are level-sensitive, NOT edge sensitive. Once a node asserts its interrupt 
line, it should keep it asserted until NVAX services the request. 

When NVAX receives an interrupt, it issues a read on the NDAL to one of four specified I/O space 
addresses. There is one address specified for each Interrupt Priority Level. This mechanism 
replaces the specific command, Read Interrupt Vector, which was used in previous systems. 

Read cycles to these specified I/O space addresses are monitored by all nodes which have an 
interrupt outstanding. The node which responds first with a Read Data Return transaction will 
deassert its interrupt request. 

Interrupting nodes on the NDAL do not have to deassert and reissue their interrupts after one 
node is serviced. The remaining nodes monitoring the bus see the return vector cycle and maintain 
their interrupt requests in anticipation of another NVAX I/O space read for an Interrupt Vector. 
If the common interrupt line remains asserted, NVAX will initiate another such cycle to be fielded 
by another first responder. 

Chapter 10 describes interrupts in detail. 

3.3.8 Clear Write Buffer 

Clear Write Buffer is used to force all writes in the processor to be delivered to memory. In 
previous systems, an explicit Clear Write Buffer command on the pin bus was used. The NDAL 
uses an I/O space address which may be read or written to indicate that write buffers should be 
cleared. 

The I/O space read is used when the CPU wishes an acknowledgement of the request. The CPU 
waits for the "read data" to return before continuing operation. The actual read data which is 
returned is meaningless except to allow the CPU to proceed. The I/O space read does not complete 
until all previous writes are complete. This mechanism may be used during a process context 
switch to force any errors associated with previous writes to happen in the context of the current 
process before the process context switch actually occurs. 

The device which responds with read data to the Clear Write Buffer is system dependent. In 
theory it would be memory, since memory responding would indicate that all buffers before 
memory had been cleared. 

The I/O space write which serves as Clear Write Buffer is used when the process mode changes 
but the process is not being switched. Here the purpose is to flush the writes as fast as possible 
when the mode changes, and to flush them ahead of any subsequent reads. Because the mode is 
changed often, it would be a performance hit to use the CWB read and to have to wait for the 
read data to return. Therefore the Clear Write Buffer is done as a write. 

When the Cbox receives the clear write buffer command from the Mbox, it flushes its write queue. 
The writes are delivered to the backup cache, since it is writeback, rather than directly to memory. 
The I/O space clear write buffer command, whether a read or a write, is then issued on the NDAL. 
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3.3.9 VAX architecturally-defined interlocks 

A VAX interlocked instruction causes the generation of a Read-Lock and a Write-Unlock which are 
guaranteed to happen back-to-back. The NDAL does not explicitly define interlocked transactions. 
Instead, the Ownership Read command is used in place of Read Lock and the Disown Write 
command is used in place of Unlock Write. 

If the interlocked location is already owned in the backup cache when the Cbox receives the read 
lock from the Mbox, the command is never seen on the NDAL as it is serviced directly on the 
cache. Writeback of the block is prevented until the write unlock is issued from the Ebox. 

3.3.9.1 Ownership and Interlock transactions 

If NVAX has a read lock in progress and P%CPU_WB_ONLY_L is asserted, the CBOX issues 
the write unlock regardless of the assertion of P%CPU_WB_.ONLY_L. Otherwise, deadlock might 
occur if P%CPU_WB_ONLYJL were asserted and a device in the system was waiting for NVAX 
to do a Write Unlock before deasserting P%CPU_WB_ONLY_L. For example, memory would not 
return Read Lock data to an I/O device if the ownership bit were set. 

The NVAX CPU does not support interlocks to I/O space. If the Cbox receives an interlock to I/O 
space, it converts it to a normal read on the NDAL. 
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3.3.10 Errors 

The NDAL supports the detection of all single-bit and some multiple-bit transmission related error 
conditions on the P%NDAL_H, P%CMD_H, P%ED_H, and P%PABITY_H lines by implementing 
parity across those lines. Additionally, the NDAL allows commanders to recover from some 
memory and I/O-space read/write class errors. 

3.3.1 0.1 Transaction Timeout 

Each NDAL node must implement a timeout counter for each read which it may have outstanding. 
The NVAX Cbox implements two timeout counters, one for each possible outstanding read. If a 
read request times out, it is aborted by the Cbox. Any missing Read Data Return cycles will 
eventually cause that read to timeout in the Cbox. See Table 3-21 for details on how timeout is 
handled. 

The NVAX BIU starts its read timeout counter when it receives P%ACKJL assertion for the read. 
The counter is an 8-bit counter which, in normal operation, is clocked with a signal from the Ebox, 
E^cTIMEOUTJ2NABLE_H. The base counter in the Ebox is 16 bits wide. This implementation results 
in the timeout values shown in Table 3—19. 



Table 3-19: NVAX Read Timeout Values in Normal Mode 



NVAX chip speed 


Timeout Granularity 


Read timeout 


10-ns NVAX 


655 microseconds 


167 milliseconds 


12-ns NVAX 


786 microseconds 


200 milliseconds 


14-ns NVAX 


917 microseconds 


234 milliseconds 


A test mode for the NVAX read timeout counters is provided, and is described in detail in 
Chapter 13. In test mode, the read timeout counters are run directly from the internal NVAX 
clock, rather than from E%TIMEOUT_ENABLE_H. The test mode timeout values are shown in 
Table 3-20. 


Table 3-20: NVAX Read Timeout Values in Test Mode 


NVAX chip speed 


Timeout Granularity 


Read timeout 


10-ns NVAX 


10 nanoseconds 


2.5 microseconds 


12-ns NVAX 


12 nanoseconds 


3.0 microseconds 


14-ns NVAX 


14 nanoseconds 


3.5 microseconds 



The occurrence of transaction timeout is not normal and is expected to happen only when the 
system is broken. 

More information on timeout may be found in Chapter 13. 
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3.3.1 0.2 Non-existent memory and I/O 

An address which is not implemented in memory on a particular system is known as non-existent 
memory. An I/O address of a device which is not present on a particular system is known as 
non-existent I/O. 

Devices on the NDAL must acknowledge any transactions to address space which they recognize 
by asserting P%ACK_L (except when there is a parity error). An address which is not recognized 
by any NDAL device is not acknowledged. 

If P%ACKJL is not asserted in response to an NVAX request, the Cbox records the error by 
saving state in its error registers. (This error case is covered in Table 3-21). Software can read 
the error registers in the other NDAL nodes and find that the absence of P%ACKJL was not 
due to a parity error on the NDAL. From that information it can deduce that the problem was 
non-existent memory or I/O. 

If an interface between the NDAL and another bus recognizes a read to some address and ACITs 
it, then finds that the address is not implemented on the other bus. the interface must use RDE 
to terminate the READ on the NDAL. It must not simply let the read time out, as this method 
of terminating the transaction takes much longer. 

If an NDAL device ACICs a write, then determines that it was to non-existent memory or I/O 
space, it should notify the CPU appropriately. One possibility is to assert P%HJERRJL. 

3.3.10.3 Error Handling 

This section describes the required behavior of NDAL commanders and responders in reaction to 
error conditions. 

In general, NDAL errors are handled as follows: 

• Null cycles have correct parity but are not acknowledged. The absence of P%ACK_L assertion 
for these cycles is not an error condition. 

• Any NDAL receiver detecting bad parity in any field on a non-NULL cycle must ignore the 
cycle. P%ACKJL must not be asserted and no action should be taken in response to the NDAL 
command. The receiver may log the error. The device which drove the NDAL cycle must log 
the error (the absence of P%ACK_L assertion) and notify NVAX in some way, depending on 
the exact situation. 

• If an NDAL responder returns Read Data Error for one quadword of return data, it must not 
send any further quadwords of data for that request. If any further fills are received, the 
Cbox treats them as unexpected fills as described in Table 3—21. 

• On an ownership read, the memory should set the ownership bit as soon as it starts sending 
data back to the requestor. The NVAX backup cache does not set its ownership bit until it 
receives all the data for the block, so if any fill data is lost, the block will appear not to be 
owned by any element in the system. This simplifies error handling if NVAX did the OREAD 
because of a write, and the write data has already been written into the cache when the error 
occurs. No other device can get access to the block while the error is being handled. 
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• An NDAL memory node may not clear its ownership bit unless all write data cycles associated 
with the Disown Write transaction are properly received. If write data is sent with the 
BAD WD ATA. command, it is considered to be properly received. 

The NVAX BIU does not retry failed commands on the NDAL. 

If the Cbox recognizes that data has been lost, it asserts C%CBOX_H_ERRJ3 to the Ebox. (In some 
cases, the data may be recoverable by software.) When C%CBOX_H_EER_H is asserted, the Cbox 
always puts the Bcache into Error Transition Mode. 

The Cbox asserts C%CBOX_S_EER_H when it recognizes a soft error. A soft error does not 
necessarily interfere with code running on the machine. In some cases, the Cbox enters ETM 
upon recognizing a soft error. 

Table 3—21 shows the response of the NVAX CPU for every error situation. 



DIGITAL CONFIDENTIAL 



NVAX Chip Interface 3-59 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Table 3-21: NDAL Errors and NVAX CPU Error Responses 



General Problem Specific situation and action taken by NVAX CPU 



NVAX detects 
parity error on any 
NDAL cycle 



P%ACK_L not 
asserted for 
NVAX-originated 
command 



Read timeout 
or Read Data Error 
before requested 
quadword is 
received 



P%ACK_L asserted 
(inconsistent parity 
error) 

P%ACK_L 

not asserted (parity 
error) 



IREAD, DREAD (to 
memory or I/O) 

OREAD 



WRITE or 

WDISOWN, 

address 

cycle or data cycle 
(to memory or I/O) 



IREAD, 

DREAD (memory or 
I/O space) 

OREAD 



Cbox asserts c%cbox_s_err_h, puts backup cache into Error 
Transition Mode. An invalidate or writeback request may 
have been missed. 

Cbox asserts c%cbox_s_err_h, puts backup cache into Error 
Transition Mode. An invalidate or writeback request may 
have been missed. 1 

Cbox aborts the read in the Cbox and the Mbox 2 , asserts 

C%CBOX_SJERK_H. 

Cbox aborts the read in the Cbox and the Mbox, enters 
Error Transition Mode, and asserts c*cbox_s_err_h. If the 
OREAD was done because of a write miss, the write will 
now be done straight to memory since the cache is in ETM. 

Cbox asserts c**cboxjblerr_e. enters Error Transition Mode. 
Data which should have been written to memory has been 
lost. If the error was on the data cycle and, in the system 
implementation, memory marks the data bad, software 
may choose to ignore the hard error response since the error 
will be detected when/if the data is read. NVAX continues 
to send the WDATA cycles even if the address cycle or one 
of the WDATA cycles is NAITd. 

Cbox aborts the read in the Cbox and the Mbox, asserts 

C%CBOX_S_KRR_H. 

Cbox aborts the read in the Cbox and the Mbox, asserts 
c%cbox_s_kre u .h, enters Error Transition Mode. The Cbox 
does not set the ownership bit in the cache. If memory has 
set its ownership bit, there is no record of ownership for 
the block in the system; however, software can analyze and 
clean up the problem by reading the Cbox error registers. 
If the OREAD was done because of a write miss, the write 
will now be done straight to memory since the cache is in 
ETM. 



1 In some systems, such as the NVAX XMI-2 system, commands may be sent on the NDAL purely to notify NVAX of an 
invalidate request; these commands are not acknowledged. 

2 The Cbox aborts the read in the Cbox by clearing valid bit in the FILL.C AM; it aborts the read in the Mbox by asserting 
c%cbox_hard jerrjs with the I_CP or D_CF command. 
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Table 3-21 (Cont.): NDAL Errors and NVAX CPU Error Responses 

General Problem Specific situation and action taken by NVAX CPU 



Read timeout 
or Read Data Error 
after requested 
quadword 
successfully 
received 



IREAD, DREAD 
OREAD 

for a read-modify or 
a read-lock 



OREAD for a write 



Read timeout 
or RDE on OREAD 
with pending 
writeback request 



Cbox aborts the read in the Cbox and the Mbox, asserts 
c*cbox_s_krr_h , does not validate cache entry. 

Cbox aborts the read in the Cbox and the Mbox, asserts 
c*OTOx_s_ERit_H, enters Error Transition Mode. The block 
is not validated or marked owned in the backup cache. 
Depending on system implementation, the ownership bit 
may be set in memory. If the OREAD was for a 
read-modify, software can analyze and correct the potential 
inconsistency in ownership information by reading the 
Cbox error registers. If the OREAD was for a read-lock, 
the write- unlock will follow to memory (as a quadword 
disown write) after the Cbox handles the error. If the 
memory subsystem has set its ownership bit, this write 
unlock preserves consistency in ownership in the memory 
subsystem; if not, the write unlock location appears to 
be owned by memory and will be handled as an error by 
memory. 

Cbox aborts the read in the Cbox, asserts c*cbox_5_ekr_h, 
enters Error Transition Mode. The write was previously 
done into the cache when the requested quadword returned, 
since the Cbox merges the write data with the fill data. 
Since the read did not complete, the ownership bit is not set 
in the cache even though the new write data is in the cache. 
Software can recover the write data if it is non-shared data. 
The backup cache must be flushed of owned data using the 
deallocate register, then put into force hit mode. The data 
can then be read and written to memory. If the data is 
shared, writes to memory may have been done out of order 
by the Cbox, and system integrity is in question. 

A pending writeback request is entered in the FILL_CAM 
when a writeback request arrives for an outstanding 
OREAD. If the OREAD does not complete successfully for 
any reason, the writeback request is aborted. The Cbox has 
not received the entire block, so it does not claim ownership 
for the block. Therefore, it does not write back the block as 
was requested. 
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Table 3-21 (Cont.): NDAL Errors and NVAX CPU Error Responses 

General Problem Specific situation and action taken by NVAX CPU 

Unexpected fill 

or unexpected RDE If there is no corresponding FTLL_CAM entry for a 

received returning fill or RDE, the Cbox ignores the fill data. 

csbCBOX_Hj5ER_H is asserted. The data is not placed in the 
Bcache and not sent to the Mbox. s CEFSTS is loaded and 
locked; the UNEXPECTED_FILL bit is set since the fill or 
RDE was unexpected. 



3 It is possible to create a scenario where an unexpected fill is received and is recognized by the Cbox because there is an 
entry in the FILL_CAM which apparently corresponds to the fill For example, suppose the Cbox starts READ A. READ 
A times out, so the Cbox aborts it and the corresponding FILL.CAM entry is cleared. Now the Cbox starts READ B using 
the same ID as the aborted READ A. Now, if memory returns read data for A, it apparently corresponds to the fill cam 
entry for READ £. The data is accepted and NVAX is unknowingly operating with incorrect data. This behavior may 
cascade into READ C, READ D, etc., if the Cbox always has a new read outstanding by the time some unexpected data 
arrives. Eventually, however, the fill cam entry will be empty when read data is returned, and the Cbox will recognize 
the error. Before the Cbox recognizes the condition. NVAX may have been behaving very strangely, as it has probably 
been operating with either wrong Dstream or 'wrong Istream data. 
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Each system which uses the NVAX CPU chip will develop its own error strategy. In general, 
enough information should be logged so that software can understand the problem. Table 3—22 
addresses system errors which the system designer should take into account. 



Table 3-22: NDAL Errors and Error Responses by System Components 

General Problem Specific situation and considerations to be made 

NDAL parity error 
and P%ACK_L Node has cache 
asserted 



Assert P%S_ERR_L and disable the cache. The node may 
have missed an invalidate. 



NDAL parity error 
and P%ACK_L not 
asserted 



WDISOWN 

to memory location 

which memory owns 



Node has no cache Assert P%S_ERR_L. 

The lack of assertion of P%ACK_L is sufficient to notify 
the transmitter of the cycle; that transmitter is responsible 
for notifying the CPU of the error. If the transmitter lost a 
write, it should assert P9cH_EKR_L. 

Any write or The memory interface should not assert P%H_EKR_L 
WD ATA because it cannot tell who sent the write. It should log 

the parity error. The transmitter which sent the write 
asserts P%H_ERR_L or takes other actions to initiate 
error recovery. 

WD ATA for a The memory should not clear its OBIT; this way, reads from 
Disown Write other CPUs will fail until software corrects the problem. 

Response is system dependent. Memory should probably 
perform the write and log the error. 



3.3.10.4 Error Recovery 

In most cases an NDAL commander is permitted to reissue a failing transaction in order to 
recover from transient bus errors. Should the recovery fail (recovery may involve one or more 
reattempts of the failed transaction), then the commander logs a hard error. Implementation 
of error recovery is a system-dependent decision. This section contains guidelines on when a 
transaction may be retried. 

• All transactions which do not receive P%ACK_L assertion for the address cycle may be 
retried. 

• Any failing NDAL Write transaction may be retried. 

• Any failing Read to memory space may be retried. 

• Any failing I/O space Write transaction may be retried. 

• It is unsafe to retry any I/O space Read transaction receiving a response timeout since some 
I/O devices may have read side effects. 

The NVAX CPU will not implement retry on any NDAL transactions. 
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3.3.11 NDAL Initialization 

When the NVAX CPU chip enters the reset state, the BIU does the following: 

• Tristates P%NDAL_H<63:0>, P%CMD_H<3:0>, P%ID_H<2:0>, and P%PARITY_H<2:0> . 

This occurs when internal reset is asserted, and is not qualified with any clock. 

• Releases P%ACK_L. This occurs when internal reset is asserted, and is not qualified with 
any clock. 

• Deasserts P%CPUJREQ_L, P%CPU_HOLD_L, and P%CPU_SUPPRESS_L. This occurs 
when internal reset is asserted, and is not qualified with any clock. 

• P%CPU_GRANT_L and P%CPU_WB_ONLY._L are sampled during reset. 

While NVAX is asserting P%SYS_RESETJL, the NDAL clocks are running. P%SYS„RESET_L 
is deasserted relative to NDAL PHI12. 

During reset, some NDAL node must drive the NDAL so that it is driven with a NOP and good 
parity by the time P%SYSJRESET_.L is deasserted. NVAX receives the NDAL during reset. The 
NDAL must be driven to valid levels with good parity by the time reset is deasserted, to prevent 
NVAX from detecting a parity error. The following is an example of how to drive the NDAL with 
a NOP, while putting valid parity on the bus: 

• Drive P%CMD_H<3:0> low (this is the NOP command). 

• Drive P%NDAL_H<63:0> low. 

• Drive P9eID_H<2:0> low. 

• Drive P%PARITY_H<2> low. 

• Drive P%PARITY_H<1> high. 

• Drive P%PARITY_H<0> low. 

The NVAX CPU does not assert P%CPU_KEQ_L until at least 4 NDAL cycles after 
P%SYS_RESETJL is deasserted. 

P%CPU_GRANT_L should be deasserted during system reset. NVAX will not drive the NDAL 
if granted during reset. 
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3.4 The XMI-2 NVAX System 

A block diagram of the XMI-2 system is shown in Figure 3—21. Everything in the picture except 
memory, I/O, other CPUs, and the XMI-2 is contained on one module. 

The XMI-2 system is being developed by MSB and is a follow-on to the Mariah XMI-2 system. 
Figure 3-21 : NVAX XMI-2 System Block Diagram 
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3.4.1 Cache coherency in the XMI2 system 

Commands on the XMI2 must be forwarded to the CPU in order to maintain cache coherency. 
Table 3-23 shows the XMI2 commands and the corresponding command which must be forwarded 
on the NDAL to NVAX The actions which Nvax takes as a result of the NDAL commands are 
shown in Section 3.3.6. 



Table 3-23: XM12-NVAX Coherency requirements 


XMI2 Command 


Resulting NDAL Command 


Read 


D stream read 


Interlock Read,Ownership Read 


Ownership Read 


Unlock Write, Write Masked 


Write 1 


Disown Write,Tag Bad Data 


none 



1 WD ATA cycles for the write may be omitted since the write is driven onto the NDAL for cache coherency reasons only. 
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Unlock Writes must be forwarded to the NDAL for the following case. Assume an I/O device does 
a Read Lock, Write Unlock to memory location A. Assume that the CPU wants to do a normal 
read to location A, and that it does not have A in its cache. Assume the following timing on the 
XMI: 

Figure 3-22: XMI2 Unlock Write example 



time I/O device CPU 

I Interlock Read A 
I Read A 

I Unlock Write A 



If the CPU reads A between the Read Lock and the Write Unlock the data the CPU caches should 
be invalidated after the write unlock. Otherwise, the CPU has stale data in its cache. This is 
because normal reads get data from XMI2 memory even if the location is interlocked. When 
writes are forwarded from the XMI2 to the NDAL, only the write address cycle must be driven. 
The write data cycles may be omitted. 
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3.5 The Lowend NVAX System - OMEGA 

A block diagram of the lowend system, called Omega, is shown in Figure 3—23. The lowend system 
is being developed in Maynard, in ESB, the Entry Systems Business Group (formerly MVB). 

Figure 3-23: NVAX Lowend System Block Diagram 
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The Lowend System implements an ownership bit in memory which is used to indicate that 
the NVAX CPU owns the block in its backup cache. This bit is covered by ECC. If an I/O 
interface issues a read or a write to a location which is owned by the NVAX backup cache, the 
memory interface holds the request until the writeback completes. It then completes the original 
transaction. The same applies to ownership transactions which may arrive from the NCA for an 
owned block of memory. 

The NCA uses the NDAL ownership transactions in order to perform interlocked transactions. 

One key problem in the Lowend System is the latency of a Qbus transaction. Once a device 
successfully issues a transaction on the Qbus, a timeout counter starts which will time out after 
8 microseconds. This timing is difficult to meet in an NVAX system because of the writeback 
cache. 

The analysis of the problem may be found in the specs for the Omega system. 
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3.6 Resolved Issues 



1. Issue: Should we implement Force Bad Parity on the NDAL for testing purposes, or can 
we get away without it? Solution: We are implementing a way to force bad parity on the 
command field of the ndal. 

2. Issue: The arbitration signals are not parity protected. Solution: This is not a problem 
because they are acknowledged by grant. The commander can always detect a problem by 
observing grant. If a request line is broken, the CPU will eventually timeout. 

3. Issue: Should the Cbox do retry on parity errors? Solution: No. The XMI has never seen 
a parity error and it is a much longer bus with big connectors which we don't have. Retry 
would add unnecessary complexity. 

4. Issue from Supnik: Allow space for extended addressing by moving byte enable over. Solution: 
Byte enable moved over. 

5. Issue: Should parity be even or odd or a combination of both? Solution: Use even parity 
across the command, even parity across the lower longword of the NDAL, and odd parity 
across the upper longword of the NDAL. The combination helps for package reasons - all pins 
can't drive the same way at once. (Steve Thierauf) 

6. Issue : Should the NDAL cycle time equal 2 or 3 CPU cycles? Solution: It will be much easier 
to design to 3 cycles so well do this in the interest of the schedule. 3 cycles may cost us 3% 
performance but it is worth it for ease of design. 

7. Issue: Should NVAX drive the lower three bits of address for I/O space transactions? Solution: 
Yes. It is in the critical path of I/O devices to deduce the address from the byte enable. 

8. Issue: If an Unlock Write transaction is directed to a location not currently locked, should 
the responder perform the write operation? Solution: This is a system-dependent issue. 
Recommendation added to the Errors section. 

9. Issue: Should we have an acknowledged I/O space write? This would preserve write ordering 
between memory writes and I/O space writes. Solution: Historically this problem has not 
been addressed so our solving it is no value added. Software can be written which avoids the 
problem. 

10. Issue: Do the lowend systems need byte parity? Solution: If a system is built without a 
backup cache, the performance is going to be poor so doing the read-modify-write for masked 
writes to memory is OK. The Lowend System will need to do read-modify-writes when the 
cache is in Error Transition Mode, but this is very rare. As long as there is time to compute 
longword parity it seems sufficient. Adding byte parity would increase the number of pins on 
the CPU and on all NDAL interfaces by 6. 

11. Issue: There was not enough time for the arbiter if HOLD was a single open-drain signal. 
Solution: Have three hold signals, one for each commander, each of which is point-to-point. 

12. Is parity enable necessary? If not, we get rid of a pin. Solution: Parity enable is not necessary. 
Every planned NVAX system is able to generate parity on every ndal cycle. 

13. An additional command is under consideration. It would be called Disown Without Writeback 
(DISWOWB). It would be driven from the CPU to the memory interface after the CPU received 
a hexaword write to an owned block. DISWOWB indicates that the backup cache has given 
up ownership and invalidated the block, but is returning no data to memory. If a hexaword 
write is done in the system, memory has no use for the old data so it would be a waste of 
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time for the CPU to return it. Solution: Tins command does not appear to be useful enough 
to warrant the complexity. 

14. Can the CPU chip remove the internal resistors on the NDAL? If we do, some chip in the 
system would have to pull the bus to valid levels during reset. Resolution: Yes, NVAX has 
removed the internal resistors. Another component in every NVAX system will pull the NDAL 
to valid levels during reset. 
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3.7 NVAX Chip Interface Signal Name Cross-Reference 

All NVAX signal names and pin names referenced in this chapter have appeared in bold and reflect 
the actual name appearing in the NVAX schematic set. For each signal and pin appearing in this 
chapter, the table below lists the corresponding name which exists in the behavioral model. 



Table 3-24: Cross-reference of all names appearing In the NVAX chip interface chapter 


Schematic Name 


Behavioral Model Name 


OX>CBOX_H_EKR_H 


0&CBOX_HJERR_H 


C<*CBOX_S_ERR_H 


OSCBOX_S_KHE_H 


C^cCB ox_hard_krr_h 


C5>CBOX_HARD_KRE_H 


E%TlMEOtrr_KNABLK_H 


E*£TTMEOUT_ENABLE_H 


P%ACK_L 


P%ACK_L 


P9bASYN C_RESET_L 


P9cASYNC_RESET_L 


P%CMD_H<3:0> 


P%CMDJH<3:0> 


P%CPU_GRANT_L 


P%CPU_GRANT_L 


P%CPU_HOLD_L 


P%CPUJ30LD_L 


P%CPU_REQ_L 


P%CPU_REQ_L 


P%CPU_SUPPRESS_L 


P%CPU_SUPPRESS_L 


P%CPU_WB_ONLY_L 


P%CPU_WB_ONLY_L 


P%DISABLE_OUT_L 


P%DISABLE_OUT_L 


P%DR_DATA_H<63:0> 


P%DR_DATA.H<63:0> 


P%DR_ECC_H<7H» 


P%DR_ECC_H<7:0> 


P%DR_INDEX_H<20:3> 


P%DR_INDEX_H<20:3> 


P%DR_OE_L 


P%DR_OE_L 


P%DR_WE_L 


P%DR_WE_L 


P%HALT_L 


P%HALT_L 


P%H_ERR_L 


P%H_ERR_L 


P%ID_H<2K)> 


P%ED_H<2K)> 


P%INT_TEVfL 


P%INT_TTM_L 


P%IRQ_L<3K>> 


P%ERQ_L<3:0> 


P%MACHINE_CHECK_H 


P%MACHENE_CHECK_H 


P%NDAL_H<63K» 


P%NDAL_H<63K» 


P%OSC_H 


P%OSC_H 


P%OSC_L 


P%OSC_L 


P%OSC_TCl_H 


P%OSC_TCl_H 


P%OSC_TC2_H 


P%OSC_TC2_H 


P%OSC_TEST_H 


P%OSC_TEST_H 
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Table 3-24 (Cont.): Cross-reference of all names appearing in the NVAX chip interface chapter 


Schematic Name 


Behavioral Model Name 


P%PAKETY_H<2 :0 > 


P%PARTTY_H<2K)> 


P%PHT1 2_IN_H 


P%PHI12_IN_H 


P%PBDl2_OUT_H 


TV*-*/ V\TT1 m a ■ It'll H 

P%PHH2_OUT_H 


P%PHI23_IN_H 


P%PHI23_IN_H 


P%PHl23_OUT_H 


P%PBDt23_OUT_H 


P%PHI34_IN_H 


T\r>/ ■ fc ■ ■ ■ a J T*.T TT 

P%PHI34_IN_B[ 


P%PHI34_OUT_H 


TW T\TTTa 4 /%T t f H TT 

P%PBQ34_OUT_H 


P9ePHl4 1_IN_H 


T'aW T1 ^ 4 TVT TT 

P%PHI41_IN_H 


P%PHl41_OUT_H 


TV/-/ TkTTT A m /NT TfTl TT 

P%PHI41_OUT_H 


P9cPP_CMD_H<2;0> 


P%PP_CMD_H<2:0> 


P%PP_DATA_H<11K>> 


T*k^W TNT% T*\ A HT^ A. XX * 

P%PP_DATA_H<11 K>> 


P%PWKFL_L 


P%PWRFL_L 


T*k/— f rrtrr< X>X^£*<X?>Fn X 

P%SYS_RESET_L 


P%SYS_KESET_L 


P9cS_ERR_L 


P%S_EKR_L 


f-wy«^ rp/'iTT XX 

PtcTCK_H 


P9cTCK_H 


T"W-< fTTT\X XT 

P%TDI_H 


X^ w rfix\x XX 

P9cTDI_H 


P%TDO_H 


P%TDO_H 


P%TEMP_H 


P%TEMP_H 


P%TEST_DATA_H 


P%TEST_DATA_H 


P%TEST_STROBE_H 


P9eTEST_ST T ROBE_H 


P%TMS_H 


P%TMS_H 


P%TS_ECC_H<5K» 


P%TS_ECC_H<5:0> 


P%TS_INDEX_H<20:5> 


P%TS_INDEX_H<20:5> 


P%TS_OE_L 


P%TS_OE_L 


P%TS_OWNED_H 


P%TS_OWNED_H 


P%TS_TAG_H<31:17> 


P%TS_TAG_H<31:17> 


P%TS_VAIJD_H 


P%TS_VAIJD_H 


P%TS_WE_L 


P%TS_WE_L 
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3.8 Revision History 



Table 3-25: Revision History 



Who 



When 



Description of change 



Rebecca Stamm 20-Feb-1991 



Rebecca Stamm 7-Nov-1990 

Rebecca Stamm 4-Jul-1990 

Rebecca Stamm 17 -May -1990 

Rebecca Stamm 20-Feb-1990 

Rebecca Stamm 3-Feb-1990 



Rebecca Stamm 30-Jan-1990 



Rebecca Stamm Ol-Dec-1989 



Rebecca Stamm 06-Mar-1989 



Update after NVAX first pass. Clarified ACK timing. Added signal 
name cross-reference. Added NDAL timing AC spec. Corrected Byte 
Enable table. Updated B cache pin timing. k%timeox3T_enable_h clocks 
the Cbox timeout counter, not e%timeout_base_h. Added P% prefix to 
all pin names. 

PP.DATA are output only. Clarify NACICd write handling. 
Correction: NVAX DOES receive the NDAL I/O signals during 
power-up. 

Update initialization description. Assert Herr on unexpected fill. 
Update ndal pin timing. NVAX may drive NOPs under WB_ONLY. 
P%ID_H<0> not driven with same value during command and data 
cycles of a write. Close force__bad_parity issue. 

Take out vector pins, add two new test pins, update description of 
unexpected fill handling by setting- CEFSTScUNEXPECTED_FILL>. 

Add unexpected RDE handling. Clarified byte enables and 
octaword-length transactions. Corrected running total for NVAX 
pins. Add detailed timeout description. Added timeout functionality 
to P%OSC_TCl_H. 

External release. Updates from internal review. Address<2:0> is sent 
out as zeros for the second half of an unaligned I/O space reference. 
NVAX does not implement internal resistors to pull the NDAL to 
valid levels during reset; a system device must drive the bus during 
reset. 

Reorganized chapter. Clarified byte enable section. NVAX issues 
identical data on both halves of the bus during I/O space writes. 
Released for internal review. 

Revision 1.0 release. Clarified byte enable table Added error 
handling for unexpected fills. Added error handling for requested 
writebacks whose OREADs do not complete. 

Release for external review. 
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Table 3-25 (Cont.): Revision History 



Who When Description of change 



Rebecca Stamm 24-Oct-1989 Several NVAX pins were added, deleted, or changed in either 

name or functionality. The terminology byte mask is changed 
to byte enable. I01_WB_ONLY, I02_WB_ONLY, IOl.SUPPRESS, 
and I02_SUPPRESS were added, and NDAL arbitration was 
changed, giving the arbiter responsibility for asserted the appropriate 
WB_ONLY lines when a SUPPRESS line is asserted. Addition 
of BAD WD ATA command. New command encodings. Elimination 
of Read Lock and Write Unlock commands on the NDAL. Add 
better explanation of Clear Write Buffer. Update error section. 
Remove PARITY_ENABLE_L pin. Removed Qbus latency problem 
description. Assigned an ID to the memory interface. Read data 
may be returned in any order: NVAX does not require the requested 
quadword first, although it is a performance advantage to return the 
requested qw first. 
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Chapter 4 
Chip Overview 



4.1 NVAX CPU Chip Box and Section Overview 

The NVAX. CPU Chip is a single-chip CMOS-4 macropipelined implementation of the base in- 
struction group, and the optional vector instruction group of the VAX architecture. Included in 
the chip are: 

• CPU: Instruction fetch and decode, microsequencer, and execution unit 

• Control Store: 1600, 61-bit microwords 

• Primary Cache: 8 KB, 2-way set associative, physically-addressed, write through, mixed 
instruction and data stream 

• Instruction Cache: 2 KB, direct-mapped , virtually addressed, instruction stream only 

• Translation Buffer: 96 entries, fully associative 

• Floating Point: 4 stage, pipelined, integrated floating point unit with selective stage 4 
bypass 

• Backup Cache Interface: Support for four cache sizes (2MB, 512KB, 256KB, 128KB), two 
tag RAM speeds and three data RAM speeds. 

• NDAL Interface: Memory subsystem interface. Supports an ownership coherence protocol 
on the Backup Cache 

The NVAX chip is designed in CMOS-4 with a typical cycle time of 14 ns, and with the option of 
running chips at a slower or faster cycle time. The chip can be incorporated into many different 
system environments, ranging from the desktop to the midrange, and from single processor to 
multiprocessor systems. 

The NVAX is a macropipelined design: it pipelines macroinstruction decode and operand fetch 
with macroinstruction execution. Pipeline efficiency is increased by queuing up instruction infor- 
mation and operand values for later use by the execution unit. Thus, when the macropipeline is 
running smoothly, the Ibox (instruction parser/operand fetcher) is running several macroinstruc- 
tions ahead of the Ebox (execution unit). Outstanding writes to registers or memory locations are 
kept in a scoreboard to ensure that data is not read before it has been written. See Chapter 5 
for a more in-depth discussion of the macropipeline. 
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This chapter gives an overview of the different sections, or "boxes", that comprise the NVAX 
CPU. For more information on any of the boxes, please see the appropriate chapters within this 
specification. Figure 4—1 is a block diagram of the boxes, and the major buses that run between 
them. 

Figure 4-1 : NVAX CPU Block Diagram 
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4.1.1 Thelbox 



The Ibox decodes VAX instructions and parses operand specifiers. Instruction control, such as 
the control store dispatch address, is then placed in the instruction queue for later use by the 
Microsequencer and Ebox. The Ibox processes the operand specifiers at a rate of one specifier per 
cycle and, as necessary, initiates specifier memory read operations. All the information needed 
to access the specifiers is queued in the source queue and destination queue in the Ebox. 

The Ibox prefetches instruction stream data into the prefetch queue (PFQ), which can hold 16 
bytes. The Ibox has a dedicated instruction-stream-only cache, called the virtual instruction cache 
(VIC). The VIC is a 2 KB, direct-mapped cache, with a block and fill size of 32 bytes. 
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The Ibox has both read and write ports to the GPR and MD portions of the Ebox register file 
which are used to process the operand specifiers. The Ibox maintains a scoreboard to ensure that 
reads and writes to the register file are always performed in synchronization with the Ebox. The 
Ibox stops processing instructions and operands upon issuing certain complex instructions (for 
example, CALL, RET, and character string instructions). This is done to maintain read/write 
ordering when the Ebox will be altering large amounts of VAX state. 

Since the Ibox is often parsing several macroinstructions ahead of the Ebox, the correct value 
for the PSL condition codes is not known at the time the Ibox executes a conditional branch 
instruction. Rather than emptying the pipe, the Ibox predicts which direction the branch will 
take, and passes this information on to the Ebox via the branch queue. The Ebox later signals 
if there was a misprediction, and the hardware backs out of the path. The branch prediction 
algorithm utilizes a 5 12-entry RAM, which caches four bits of branch history per entry. 

4.1.2 The Ebox and Microsequencer 

The Ebox and Microsequencer work together to perform the actual "work" of the VAX instructions. 
Together they implement a four stage micropipelined unit, which has the ability to stall and to 
microtrap. The Ebox and Microsequencer dequeue instruction and operand information provided 
by the Ibox via the instruction queue, the source queue, and the destination queue. For literal type 
operands, the source queue contains the actual operand value. In the case of register, memory, 
and immediate type operands, the source queue holds a pointer to the data in the Ebox register 
file. The contents of memory operands are provided by the Mbox based on earlier requests from 
the Ibox. GPR results are written directly back to the register file. Memory results are sent to 
the Mbox, where the data will be matched with the appropriate specifier address previously sent 
by the Ibox. At times, the Ebox initiates its own memory reads and writes using E%VA_BUS_L 
and E%WBUS_H. 

The Microsequencer determines the next microword to be fetched from the control store. It 
then provides this cycle-by-cycle control to the Ebox. The Microsequencer allows for eight-way 
microbranches, and for microsubroutines to a depth of six. 

The Ebox contains a five-port register file, which holds the VAX GPRs, six Memory Data Registers 
(MDs), six microcode working registers, and ten miscellaneous CPU state registers. It also con- 
tains an ALU, a shifter, and the VAX PSL. The Ebox uses the RMUX, controlled by the retire 
queue, to order the completion of Ebox and Fbox instructions. As the Ebox and the Fbox are 
distinct hardware resources, there is some amount of execution overlap allowed between the two 
units. 

The Ebox implements specialized hardware features in order to speed the execution of certain 
VAX instructions: the population counter (CALLx, PUSHR, POPR), and the mask processing unit 
(CALLx, RET, FFx, PUSHR, POPR). The Ebox also has logic to gather hardware and software 
interrupt requests, and to notify the Microsequencer of pending interrupts. 

4.1.3 The Fbox 

The Fbox implements a four stage pipelined execution unit with selective stage 4 bypass for the I 
floating point and integer multiply instructions. Operands are supplied by the Ebox up to 64 
bits per cycle on E%ABUS_H and E%BBUSJS. Results are returned to the Ebox 32 bits per cycle on 
P%FBOX_RESULT_H. The Ebox is responsible for storing the Fbox result in memory or the GPRs. 



DIGITAL CONFIDENTIAL 



Chip Overview 4-3 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



4.1.4 TheMbox 

The Mbox receives read requests from the Ibox (both instruction stream and data stream) and 
from the Ebox (data stream only). It receives write/store requests from the Ebox. Also, the Cbox 
sends the Mbox fill data and invalidates for the Pcache. The Mbox arbitrates between these 
requesters, and queues requests which cannot currently be handled. Once a request is started, 
the Mbox performs address translation and cache lookup in two cycles, assuming there are no 
misses or other delays. The two-cycle Mbox operation is pipelined. 

The Mbox uses the translation buffer (96 fully associative entries) to map virtual to physical 
addresses. In the case of a TB miss, the memory management hardware in the Mbox will read 
the page table entry and fill the TB. The Mbox is also responsible for all access checks, TNV 
checks, M-bit checks, and quadword unaligned data processing. 

The Mbox houses the Primary Cache (Pcache). The Pcache is 8KB, 2-way set associative and 
writethrough, with a block and fill size of 32 bytes. The Pcache state is maintained as a subset 
of the Backup Cache. 

The Mbox ensures that Ibox specifier reads are ordered correctly with respect to Ebox specifier 
stores. This memory "scoreboarding n is accomplished by using the PA queue, a small list of 
physical addresses which have a pending Ebox store. 

4.1.5 The Cbox 

The Cbox is the controller for the second level cache (the Backup Cache, or Bcache). Both the 
tags and data for the Bcache are stored in off-chip RAMs. The size and access time of the Bcache 
RAMs can be configured as needed by different system environments. The Bcache sizes supported 
are 2 MB, 512 KB, 256 KB, and 128 KB. In addition, a system with no Bcache RAMs is supported, 
although significant performance degradation occurs without a Bcache. The Bcache is a direct 
mapped writeback cache with block and fill sizes of 32 bytes. The Cbox packs sequential writes 
to the same quadword in order to minimize Bcache write accesses. Multiple write commands are 
held in the eight-entry WRITE_QUEUE. 

The Cbox is also the interface to the NDAL, which is the NVAX connection to the memory subsys- 
tem. The NDAL_IN_QUEUE loads fill data and writeback requests from the NDAL to the CPU. 
The NON_WRITEBACK_QUEUE and WRITEBACK_QUEUE hold read requests and writeback 
data to be sent to the memory subsystem over the NDAL. 

4.1.6 Major Internal Buses 

This is a list of the major interbox buses: 

• B%S6_DAIA^H: 

This bidirectional bus between the Cbox and MBox is used to transfer write data to the backup 
cache, to to transfer fill data to the primary cache. 

• C%CB03LADDR_H: 

This bus is used to transfer the physical address of a Pcache invalidate from the Cbox to the 
MBox. 

• E%ABUS_H, E%BBUS_H: 

These two 32-bit buses contain the A- and B-port operands for the Ebox, and are also used 
to transfer operand data to the Fbox. 
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• E%mOX w IAJ5US_L: 

This bus is used by the Ibox to read the Ebox Register File in order to perform an operand 
access. An example is to read a register's contents for a register deferred type specifier. 

• E%DQ_RETTRE_H, E%DQ_RKTJLKE_EMODE_H, E%DQ_RETTRE_RN_H: 

This collection of related buses transfers information from the Ebox to the Ibox when a des- 
tination queue entry is retired. 

• E%SQ_RETTRE_H, E%SQ_EETIEE_MD_H, E%SQ_RETDSE_RMODE_H, E%SQ_RETIBE_EN1_H, 
E%SQ_RETIRE_BN2_H: 

This collection of related buses transfers information from the Ebox to the Ibox when a source 
queue entry is retired. 

• E%VA W BUS_L: 

This bus transfers an address from the Ebox to the MBox. 

• E%WBUS_H: 

This 32-bit bus transfers write data from the RMUX to the register file and the Mbox. 

• E_USQ9avnB_H: 

This bus carries Control Store data from the Microsequencer to the Ebox. 

• E_BUS%UTEST_L: 

This 3-bit bus transfers microbranch conditions from the Ebox to the microsequencer. 

• F%FB OX_RESULT_H : 

This bus is used to transfer results from the Fbox to the Ebox. 

• I%IBOXJVDDR_H: 

This bus transmits the virtual address of an Ibox memory reference to the Mbox. The address 
may be for instruction prefetch or an operand access. 

• I%IQJBUS_H: 

This bus carries instruction information from the Ibox to the Instruction Queue in the 
Microsequencer. 

• I%IBOX_IW_BUS_H: 

This bus is used by the Ibox to write the Ebox Register File for autoincrement/decrement type 
specifiers and to deliver immediate operands to the Register File. 

• I%OPERAND_BUS_H: 

This bus transfers information from the Ibox to the source and destination queues in the 
Ebox. 

• M9«MD_BUS_H: 

The bus returns right-justified memory read data from the Mbox to either the Ibox (64 bits) 
or the Ebox (32 bits). 

• M%S6_PA k _H: 

This bus transfers the address for a backup cache reference from the MBox to the Cbox. 

• NDAL: 

The NDAL are bidirectional off-chip multiplexed address and data lines used by the Cbox to 
communicate with the memory subsystem. The NDAL carries fill data and writeback requests 
to the CPU, and writeback data and read requests from the CPU to memory. 
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4.2 Revision History 



Table 4-1: Revision History 



Who 


When 


Description of change 


Debra Bernstein 


06-Mar-1989 


Release for external review. 


Mike Uhler 


18-Dec-1989 


Update for second-pass release. 


MikeUhler 


04-Dec-1990 


Update after pass 1 PG. 
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Chapter 5 

Macroinstruction and Microinstruction Pipelines 



5.1 Introduction 

This chapter discusses the architecture of the NVAX CPU macroinstruction and microinstruction 
pipeline. It includes a section of general pipeline fundamentals to set the stage for the specific 
NVAX CPU implementation of the pipeline. This is followed by an overview of the NVAX CPU 
pipeline, an examination of macroinstruction execution, and a discussion of stall and exception 
handling from the viewpoint of the Ebox. 

5.2 Pipeline Fundamentals 

This section discusses the fundamentals of instruction pipelining in a general manner that is 
independent of the NVAX CPU implementation. It is intended as a primer for those readers who 
do not understand the concept and implications of instruction pipelining. Readers familiar with 
this material are encouraged to skip (or at most skim) this section. 

5.2.1 The Concept of a Pipeline 

The execution of a VAX macroinstruction involves a sequence of steps which are carried out 
in order to complete the macroinstruction operation. Among these steps are: instruction fetch, 
instruction decode, specifier evaluation and operand fetch, instruction execution, and result store. 
On the simplest machines, these steps are carried out sequentially, with no overlap of the steps, 
as shown in Figure 5—1. 



DIGITAL CONFIDENTIAL 



Macroinstruction and Microinstruction Pipelines 5-1 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 5-1 : Non-Pipelined Instruction Execution 



— — — — — Tine — 

h 

Instruction 1 ISO | SI I S2 I S3 I S4 | S5 I S6 | 



+- + 

Instruction 2 ISO |S1 IS2 |S3 |S4 | S5 |S6 | 



Instruction 3 I SO | SI IS2 I S3 |S4 |S5 |S6 | 



In this diagram, "SO", a S2 w , "S6" denote particular steps in the execution of an instruction. 
For this simple scheme, all of the steps for one instruction are performed, and the instruction is 
completed, before any of the steps for the next instruction are started. 

In more complex machines, one or more steps of the execution process are carried out in parallel 
with other steps. For example, consider Figure 5—2. 

Figure 5-2: Partially-Pipelined Instruction Execution 



Instruction! |S0 I SI IS2 |S3 |S4 I S5 I S? ; 

+ - 

T — — .+ 

Instruction 2 I SO • SI ! S2 I S3 |S4 |S5 IS6 | 

+———————+ 



Instruction 3 ISO |S1 I S2 |S3 I S4 IS5 IS6 | 



In this example, step S6 of each instruction is overlapped in time (or executed in parallel) with 
step SO of the next instruction. In doing so, the number of instructions executed per unit time 
(instruction throughput) goes up because an instruction appears to take less time to complete. 

In the most complex machines, most (or all) of the steps are executed in parallel as indicated in 
Figure 5-3. 
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Figure 5-3: Fully-Pipelined Instruction -Execution 
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In this example every step of instruction execution is performed in parallel with every other 
step. This means that a new instruction is started as soon as step SO is completed for the 
previous instruction. If each step, S0..S6, took the same amount of time, the apparent instruction 
throughput would be seven times greater than that of Figure 5—1 above, even though each 
instruction takes the same amount of time to execute in both cases. 

Figures 5—2 and 5—3 are examples of the concept of instruction pipelining, in which one or 
more steps necessary to execute an instruction are performed in parallel with steps for other 
instructions. 



5.2.2 Pipeline Flow 

A real-world form of a pipeline is an automobile assembly line. At each station of the assembly 
line (called segments of the pipeline in our case), a task is performed on the partially completed 
automobile and the result is passed on to the next station. At the end of the assembly line, the 
automobile is complete. 

In an instruction pipeline, as in an assembly line, each segment is responsible for performing a 
task and passing the completed result to the next segment. The exact task to be performed in 
each pipeline segment is a function of the degree of pipelining implemented and the complexity 
of the instruction set. 

One attribute of an automobile assembly line is equally important to an instruction pipeline: 
smooth and continuous flow. An automobile assembly line works well because the tasks to be 
performed at each station take about the same amount of time. This keeps the line moving at a 
constant pace, with no starts and stops which would reduce the number of completed automobiles 
per unit time. 

An analogous situation exists in an instruction pipeline. In order to achieve real efficiency in 
an instruction pipeline, information must flow smoothly and continuously from the start of the 
pipeline to the end. If a pipeline segment somewhere in the middle is not able to supply results 
to the next segment of the pipeline, the entire pipeline after the offending segment must stop, or 
stall, until the segment can supply a result. 

In the general case, a pipeline stall results when a pipeline segment can not supply a result to 
the next segment, or when it can not accept a new result from a previous segment. 
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This is a fundamental problem with most instruction pipelines because they occasionally (or not 
so occasionally) stall. Stalls result in decreased instruction throughput because the smooth flow 
of the pipeline is broken. 

A typical example of a pipeline stall involves memory reads. A simple three-segment pipeline 
might fetch operands in segment 1, use the operands to compute results in segment 2, and make 
memory references or store results in segment 3, as shown in Figure 5—4. 

Figure 5-4: Simple Three-Segment Pipeline 



+____—.—_+ +«.___—.___-.+ +—__—_____+ 

I Operand | -> I Computation I -> I Memory I 
I Access || || Read I 
+ +-—._____—+ 



Figure 5—5 illustrates what happens when the pipeline control wants to use the result of the 
memory read as an operand. 

Figure 5-5: Information Flow Against the Pipeline 



21 | Operand •-> I Computation |->| Memory i— — -+ 
I Ac-ess I II Read" I I 
+— +- — - — — — + +— — — + | 

I +- — — — - — + +——-—.—.+ 

12 t >| Operand | -> (Computation |->| Result | 

I Access || || Store I 

+—_———+ +„_———.+ +--___-_«„-+ 



In this case, the operand access segment of 12 can not supply an operand to the computation 
segment because the memory read done by II has not yet completed. As a result, the pipeline 
must stall until the memory read has completed. This is shown in Figure 5-6. 
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Stalls Introduced by Backward Pipeline Flow 
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In this diagram, the memory read data from II is not available until the read request passes 
through segment 3 of the pipeline. But the operand access segment for 12 wants the data 
immediatel3 r . The result is that the operand access segment of 12 has to stall twice waiting for 
the memo:}' read data to become available. This, in turn, stalls the rest of the pipeline segments 
after the operand access segment. 

This situation is an excellent example of an age-old problem with instruction pipelining. The 
natural and desired direction of information flow in a pipeline is from left to right in the above 
diagrams. In this case, information must flow from the output of the memory read segment into 
the operand access segment. This requires a right-to-left movement of information from a later 
pipeline segment to an earlier one. In general, any information transfer which goes against the 
normal flow of the pipeline has the potential for causing pipeline stalls. 

5.2.3 Stalls and Exceptions in an Instruction Pipeline 

Even the best pipeline design must be prepared to deal with stalls and exceptions created in the 
pipeline. As mentioned above, a stall is a condition in which a pipeline segment can not accept 
a new result from a previous segment, or can not send a result to a new segment. An exception 
occurs when a pipeline segment detects an abnormal condition which must stop, and then drain 
the pipeline. Examples of exceptions are: memory management faults, reserved operand faults, 
and arithmetic overflows. One of the inherent costs of a pipelined implementation is the extra 
logic necessary to deal with stalls and exceptions. 

There are two primary considerations concerning stalls: what action to take when one occurs, 
and how to minimize them in the first place. The design of most instruction pipelines assumes 
that the pipeline will not stall, and handles the stall condition as a special case, rather than 
the other way around. This means that each segment of the pipeline performs its function and 
produces a result each cycle. If a stall occurs just before the end of the cycle, the segment must 
block global state updates and repeat the same operation during the next cycle. Hie design of 
the pipeline control must take this into account and be prepared to handle the condition. 

A common stall condition occurs when each pipeline segment has the same average speed, but 
different peak speeds. For example, a pipeline segment whose task is to perform both memory 
references and register result stores may take longer to perform memory references than result 
stores. This can cause earlier segments of the pipeline to stall because the segment can not 
take new inputs as fast if it is doing a memory reference rather than a result store. A common 
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technique to minimize this problem is to place buffers between pipeline segments, as shown in 
Figure 5-7. 

Figure 5-7: Buffers Between Pipeline Segments 



+-- — — — — - + +— — + +___________+ 

I Operand | -> I Buffer|->|Computation|->| Buffer |->| Hemory | 
I Access || || ||, || Read | 

+- . — + _ + +__________.. + 



By placing a buffer of sufficient depth between each segment of the pipeline, segments of differing 
peak speeds can avoid stalls caused if the next segment is unable to accept a new result. Instead, 
the result goes into the inter-segment buffer and the next segment removes it from the buffer 
when it needs it. Unfortunately, adding such buffers means that additional logic must also be 
added to handle the buffer full/buffer empty conditions. 

The performance advantage of an instruction pipeline comes from the parallelism built into the 
pipeline. If the parallelism is defeated by, for example, a stall, the advantage starts to drop. One 
problem associated with pipelines is that they can provide "lumpy" performance. That is, two 
similar programs may experience radically different performance if one causes many more stalls 
(which defeat the parallelism of the pipeline) than the other. 

Pipeline exceptions are different from stalls in that exceptions cause the pipeline to empty 
or drain. Usually, everything that entered the pipeline before the point of error is allowed 
to complete. Everything that entered the pipeline after the point of error is prevented from 
completing. This can add considerable complexity to the pipeline control. 

A larger problem occurs when the designer wants exceptions to be recoverable. Consider an 
exception caused by a memory management fault. On the VAX, this condition can occur because 
of a TB miss. The correct response to this fault is to read a PTE from memory, refill the TB, and 
restart the request that caused the fault. This can add considerable complexity to the design. 

5.3 NVAX CPU Pipeline Overview 

The remainder of this chapter discusses the NVAX CPU pipeline, which is shown as a block 
diagram in Figure 5—8. This is a high-level view of the CPU and abstracts many of the details. 
For a more detailed view of the pipeline, users are encouraged to refer to the individual box 
chapters in this specification. 

The pipeline is divided into seven segments denoted as "SO" through "86". In Figure 5-8, the 
components of each section of the CPU are shown in the segment of the pipeline in which they 
operate. 

The NVAX CPU is fully pipelined and, as such, is most similar to the abstract example 
shown in Figure 5-3. In addition to the overall macroinstruction pipeline, in which multiple 
macromstructions are processed in the various segments of the pipeline, most of the sections 
also micropipeline operations. That is, if more than one operation is required to process a 
macroinstruction, the multiple operations are also pipelined within a section. 
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5.3.1 Normal Macroinstruction Execution 

Execution of macroinstructions in the NVAX pipeline is decomposed into many smaller steps 
which are the distributed responsibility of the various sections of the chip. Because the NVAX 
CPU implements a macroinstruction pipeline, each section is relatively autonomous, with queues 
inserted between the sections to normalize the processing rates of each section. 

5.3.1.1 Thelbox 

The Ibox is responsible for fetching instruction stream data for the next instruction, decomposing 
the data into opcode and specifiers, and evaluating the specifiers with the goal of prefetching 
operands to support Ebox execution of the instruction. 

The Ibox is distributed across segments SO through S3 of the pipeline, with most of the work 
being done in SI. In SO, instruction stream data is fetched from the virtual instruction cache 
(VIC) using the address contained in the virtual instruction buffer address register (VIBA). The 
data is written into the prefetch queue (PFQ) and VIBA is incremented to the next location. 

In segment SI, the PFQ is read and the burst unit uses internal state and the contents of 
the IROM to select the next instruction stream component^-either an opcode or specifier. This 
decoding processing is known as bursting. Some instruction components take multiple cycles to 
burst. For example, FD opcodes require two burst cycles: one for the FD byte, and one for the 
second opcode byte. Similarly, indexed specifiers require at least two burst cycles: one for the 
index byte, and one or more for the base specifier. 

When an opcode is decoded, the information is passed to the issue unit, which consults the IROM 
for the initial Ebox control store address of the routine which will process the instruction. The 
issue unit sends the address and other instruction-related information to the instruction queue 
where it is held until the Ebox reaches the instruction. 

When a specifier is decoded, the information is passed to the source and destination queue 
allocation logic and, potentially, to the complex specifier pipeline. The source and destination 
queue allocation logic allocates the appropriate number of entries for the specifier in the source 
and destination queues in the Ebox. These queues contain pointers to operands and results, and 
are discussed in more detail below. 

If the specifier is not a short literal or register specifier, which are collectively known as 
simple specifiers, it is considered to be a complex specifier and is processed by the small 
microcode-controlled complex specifier unit (CSU), which is distributed in segments SI (control 
store access), S2 (operand access, including register file read), and S3 (ALU operation, Mbox 
request, GPR write) of the pipeline. The CSU pipeline computes all specifier memory addresses, 
and makes the appropriate request to the Mbox for the specifier type. lb avoid reading or writing 
a GPR which is interlocked by a pending Ebox reference, the CSU pipeline includes a register 
scoreboard which detects data dependencies. The CSU pipeline also provides additional help to 
the Ebox by supplying operand information that is not an explicit part of the instruction stream. 
For example, the PC is supplied as an implicit operand for instructions that require it (such as 
BSBB). 

The branch prediction unit (BPU) watches each opcode that is decoded looking for conditional 
and unconditional branches. For unconditional branches, the BPU calculates the target PC and 
redirects PC and VIBA to the new path. For conditional branches, the BPU predicts whether 
the instruction will branch or not based on previous history. If the prediction indicates that the 
branch will be taken, PC and VIBA are redirected to the new path. The BPU writes the conditional 
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branch prediction flag into the branch queue in the Ebox, to be used by the Ebox in the execution 
of the instruction. The BPU maintains enough state to restore the correct instruction PC if the 
prediction turns out to be incorrect. 

5.3.1 J2 The Microsequencer 

The microsequencer operates in segment S2 of the pipeline and is responsible for supplying to 
the Ebox the next microinstruction to execute. If a macroinstruction requires the execution of 
more than one microinstruction, the microsequencer supplies each microinstruction in sequence 
based on directives included in the previous microinstruction. 

At macroinstruction boundaries, the microsequencer removes the next entry from the instruction 
queue, which includes the initial microinstruction address for the macroinstruction. If the 
instruction queue is empty, the microsequencer supplies the address of a special no-op 
microinstruction. 

The microsequencer is also responsible for evaluating all exception requests, and for providing 
a pipeline flush control signal to the Ebox. For certain exceptions and interrupts, the 
microsequencer injects the address of a special microinstruction handler that is used to respond 
to the event. 

5.3.1.3 The Ebox 

The Ebox is responsible for executing all of the non-floating point instructions, for delivery of 
operands to and receipt of results from the Fbox, and for handling non-instruction events such 
as interrupts and exceptions. The Ebox is distributed through segments S3 (operand access, 
including register file read), S4 (ALU and shifter operation, Emux request), and S5 (Rmux 
completion, register write, completion of Mbox request) of the pipeline. 

For the most part, instruction operands are prefetched by the Ibox, and addressed indirectly 
through the source queue. The source queue contains the operand itself for short literal specifiers, 
and a pointer to an entry in the register file for other operand types. 

An entry in the field queue is made when a field-type specifier entry is made into the source queue. 
The field queue provides microbranch conditions that allow the Ebox microcode to determine if 
a field-type specifier addresses either a GPR or memory. A microbranch on a valid field queue 
entry retires the entry from the queue. 

The register file is divided into four parts: the GPRs, memory data (MD) registers, working 
registers, and CPU state registers. For register-mode specifiers, the source queue points to the 
appropriate GPR in the register file. For other non-short literal specifier modes, the source queue 
points to an MD register. The MD register is either written directly by the Ibox, or by the Mbox 
as the result of a memory read generated by the Ibox. 

The S3 segment of the Ebox pipeline is responsible for selecting the appropriate operands for the 
Ebox and Fbox execution of instructions. Operands are selected onto E%ABUS_H and E%BBUS_H 
for use in both the Ebox and Fbox. In most instances, these operands come from the register file, 
although there are other data path sources of non-instruction operands (such as the PSL). 

Ebox computation is done by the ALU and the shifter in the S4 segment of the pipeline on 
operands supplied by the S3 segment. Control for these units is supplied by the microinstruction 
which was originally supplied to the S3 segment by the microsequencer, and then subsequently 
moved forward in the pipeline. 
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The S4 segment also contains the RMUX, whose responsibility is to select results from either 
the Ebox or Fbox and perform the appropriate register or memory operation. The RMUX inputs 
come from the ALU, shifter, and P%FBOX w RESULT_H at the end of the cycle. The RMUX actually 
spans the S4/S5 boundary such that its outputs are valid at the beginning of the S5 segment. 
The RMUX is controlled by the retire queue, which specifies the source (either Eboz or Fbox) of 
the result to be processed (or retired) next. Non-selected RMUX sources are delayed until the 
retire queue indicates that they should be processed. 

As the source queue points to instruction operands, so the destination queue points to the 
destination for instruction results. If the result is to be stored in a GPR, the destination queue 
contains a pointer to the appropriate GPR. If the result is to be stored in memory, the destination 
queue indicates that a request is to be made to the Mbox, which contains the physical address of 
the result in the PA queue (which is described below). This information is supplied as a control 
input to the RMUX logic. 

Once the RMUX selects the appropriate source of result information, it either requests Mbox 
service, or sends the result onto E%WBUS_H to be written back to the register file or to other data 
path registers in the S5 segment of the pipeline. The interface between the Ebox and Mbox for 
all memory requests is the EM_LATCH, which contains control information and may contain an 
address, data, or both, depending on the type of request. In addition to operands and results that 
are prefetched by the Ibox, the Ebox can also make explicit memory requests to the Mbox to read 
or write data. 

5.3.1.4 The Fbox 

The Fbox is responsible for executing all of the floating point instructions in the VAX base 
instruction group, as well as the longword-length integer multiply instructions. 

For each instruction that the Fbox is to execute, it receives from the microsequencer the opcode 
and other instruction-related information. The Fbox receives operand data from the Ebox on 
E%ABUS_H and E%BBUS_H. 

Execution of instructions is performed in a dedicated Fbox pipeline that appears in segment S4 
of Figure 5—8, but is actually a minimum of three cycles in length. Certain instructions, such 
as integer multiply, may require multiple passes through some segments of the Fbox pipeline. 
Other instructions, such as divide, are not pipelined at all. 

Fbox results and status are returned via F%FBOX_RESULT_H to the RMUX in the Ebox for 
retirement. When the instruction is next to retire, the RMUX hardware, as directed by the 
destination queue, sends the results to either the GPRs for register destinations, or to the Mbox 
for memory destinations. 

5.3.1.5 The Mbox 

The Mbox operates in the S5 and S6 segments of the pipeline, and is responsible for all memory 
references initiated by the other sections of the chip. Mbox requests can come from the Ibox 
(for VIC fills and for specifier references), the Ebox or Fbox via the RMUX and the EMJLATCH 
(for instruction result stores and for explicit Ebox memory requests), from the Mbox itself (for 
translation buffer fills and PTE reads), and from the Cbox (for invalidates and cache fills). 
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All virtual references are translated to a physical address by the translation buffer (TB), which 
operates in the S5 segment of the pipeline. For instruction result references generated by the 
Ibox, the translated address is stored in the physical address queue (PA queue). These addresses 
are later matched with data from the Ebox or Fbox, when the result is calculated. 

For memory references, the physical address from either the TB or the PA queue is used to 
address the primary cache (Pcache) starting in the S5 segment of the pipeline and continuing 
into the S6 segment. Read data is available in the middle of the S6 segment, right-justified and 
returned to the requester on M%MDJBUS_H by the end of the cycle. Writes are also completed by 
the end of the cycle. Although the Pcache access spans the S5 and S6 segments of the pipeline, 
a new access can be started each cycle in the absence of a TB or cache miss. 

5.3.1.6 TheCbox 

The Cbox is responsible for maintaining and accessing the backup cache (Bcache), and for control 
of the off-chip bus (the NDAL). The Cbox receives input from the Mbox in the S6 segment of the 
pipeline, and usually takes multiple cycles to complete a request. For this reason, the Cbox is 
not shown in specific pipeline segments. 

If a memory read misses in the Pcache, the request is sent to the Cbox for processing. The 
Cbox first looks for the data in the Bcache and fills the Pcache from the Bcache if the data is 
present. If the data is not present in the Bcache, the Cbox requests a cache fill on the NDAL 
from memory. When memory returns the data, it is written to both the Bcache and to the Pcache 
(and potentially to the VIC). Although Pcache fills are done by making a request to the Mbox 
pipeline, data is returned to the original requester as quickly as possible by driving data directly 
onto B%S6_DATA_H, and from there onto M%MD_BUS_H as soon as the bus is free. 

Because the Pcache operates as a write-through cache, all memory writes are passed to the Cbox. 
Tb avoid multiple writes to the same Bcache block, the Cbox contains a write buffer in which 
multiple writes to the same quadwords are packed together before the Bcache is actually written. 
To maintain cache coherence with other system components, the Cbox acquires ownership of any 
data that is written to the cache. 

5.3.2 Stalls in the Pipeline 

Despite our best attempts at keeping the pipeline flowing smoothly, there are conditions which 
cause segments of the pipeline to stall. Conceptually, each segment of the pipeline can be 
considered as a black box which performs three steps every cycle: 

1. The task appropriate to the pipeline segment is performed, using control and inputs from the 
previous pipeline segment. The segment then updates local state (within the segment), but 
not global state (outside of the segment). 

2. Just before the end of the cycle, all segments send stall conditions to the appropriate state 
sequencer for that segment, which evaluates the conditions and determines which, if any, 
pipeline segments must stall. 

3. If no stall conditions exist for a pipeline segment, the state sequencer allows it to pass results 
to the next segment and accept results from the previous segment. This is accomplished by 
updating global state. 
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This sequence of steps maximizes throughput by allowing each pipeline segment to assume that 
a stall will not occur (which should be the common case). If a stall does occur at the end of 
the cycle, global state updates are blocked, and the stalled segment repeats the same task (with 
potentially different inputs) in the next cycle (and the next, and the next) until the stall condition 
is removed. 

This description is over-simplified in some cases because some global state must be updated by a 
segment before the stall condition is known. Also, some tasks must be performed by a segment 
once and only once. These are treated specially on a case-by-case basis in each segment. 

Within a particular section of the chip, a stall in one pipeline segment also causes stalls in all 
upstream segments (those that occur earlier in the pipeline) of the pipeline. Unlike Rigel, stalls 
in one segment of the pipeline do not cause stalls in downstream segments of the pipeline. For 
example, a memory data stall in Rigel also caused a stall of the downstream ALU segment. In 
NVAX, a memory data stall does not stall the ALU segment (a no-op is inserted into the S4 
segment when S4 advances to S5). 

There are a number of stall conditions in the chip which result in a pipeline stall. Each is 
discussed briefly below and in much more detail in the appropriate chapter of this specification. 

5.3.2.1 SO Stalls 

Stalls that occur in the SO segment of the pipeline are as follows: 
Ibox: 

* PFQ full: In normal operation, the VIC is accessed using the address in VIBA, the data is 
sent to the prefetch queue, and VIBA is incremented. If the PFQ is full, the increment of 
VIBA is blocked, and the data is re-referenced in the VIC until there is room for it in the 
PFQ. At that point, prefetch resumes. 

5.3.2.2 S1 Stalls 

Stalls that occur in the SI segment of the pipeline are as follows: 
Ibox: 

* Insufficient PFQ data: The burst unit attempts to decode the next instruction component 
each cycle. If there are insufficient PFQ bytes valid to decode the entire component, the burst 
unit stalls until the required bytes are delivered from the VIC. 

* Source queue or destination queue full: During specifier decoding, the source and destination 
queue allocation logic must allocate enough entries in each queue to satisfy the requirements 
of the specifier being parsed, lb guarantee that there will be sufficient resources available, 
there must be at least 2 free source queue entries and 2 free destination queue entries to 
complete the burst of the specifier. If there are insufficient free entries in either queue,the 
burst unit stalls until free entries become available. 

* MD file full: When a complex specifier is decoded, the source queue allocation logic must 
allocate enough memory data registers in the register file to satisfy the requirements of the 
specifier being parsed, lb guarantee that there will be sufficient resources available, there 
must be at least 2 free memory data registers available to complete the burst of the specifier. 
If there are insufficient free registers, the burst unit stalls until enough memory data registers 
becomes available. 
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* Second conditional branch decoded: The branch prediction unit predicts the path that each 
conditional branch will take and redirects the instruction stream based on that prediction. It 
retains sufficient state to restore the alternate path if the prediction was wrong. If a second 
conditional branch is decoded before the first is resolved by the Ebox, the branch prediction 
unit has nowhere to store the state, so the burst unit stalls until the Ebox resolves the actual 
direction of the first branch. 

* Instruction queue full: When a new opcode is decoded by the burst unit, the issue unit 
attempts to add an entry for the instruction to the instruction queue. If there are no free 
entries in the instruction queue, the burst unit stalls until a free entry becomes available, 
which occurs when an instruction is retired through the RMUX. 

* Complex specifier unit busy: If the burst unit decodes an instruction component that must 
be processed by the CSU pipeline, it makes a request for service by the CSU through an SI 
request latch. If this latch is still valid from a previous request for service (either due to a 
multi-cycle flow or a CSU stall), the burst unit stalls until the valid bit in the request latch 
is cleared. 

* Immediate data length not available: The length of the specifier extension for immediate 
specifiers is dependent on the data length of the specifier for that specific instruction. The 
data length information comes from one of the Ibox instruction PLAs which is accessed based 
on the opcode of the instruction. If the PLA access is not complete before an immediate 
specifier is decoded (which would have to be the first specifier of the instruction), the burst 
unit stalls for one cycle. 

5.3.2.3 S2 Stalls 

Stalls that occur in the S2 segment of the pipeline are as follows: 
Ibox: 

* Outstanding Ebox or Fbox GPR write: In order to calculate certain specifier memory 
addresses, the CSU must read the contents of a GPR from the register file. If there is a 
pending Ebox or Fbox write to the register, the Ibox GPR scoreboard prevents the GPR read 
by stalling the S2 segment of the CSU pipeline. The stall continues until the GPR write 
completes. 

* Memory data not valid: For certain operations, the Ibox makes an Mbox request to return 
data which is used to complete the operation (e.g., the read done for the indirect address of a 
displacement deferred specifier). The Ibox MD register contains a valid bit which is cleared 
when a request is made, and set when data returns in response to the request. If the Ibox 
references the Ibox MD register when the valid bit is off, the S2 segment of the CSU pipeline 
stalls until the data is returned by the Mbox. 

Microsequencer: 

* Instruction queue empty: The final microinstruction of a macroinstruction execution flow 
in the Ebox is indicated when a SEQJMUX/LAST. CYCLE* microinstruction is decoded 
by the microsequencer. In response to this event, the Ebox expects to receive the first 
microinstruction of the next macroinstruction flow based on the initial address in the 
instruction queue. If the instruction queue is empty, the Microsequencer supplies the 
instruction queue stall microinstruction in place of the next macroinstruction flow. In effect, 
this stalls the microsequencer for one cycle. 
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5.3.2.4 S3 Stalls 

Stalls that occur in the S3 segment of the pipeline are as follows: 
Ibox: 

* Outstanding Ebox GPR read: In order to complete the processing for auto-increment, 
auto-decrement, and auto-increment deferred specifiers, the CSU must update the GPR with 
the new value. If there is a pending Ebox read to the register through the source queue, the 
Ibox scoreboard prevents the GPR write by stalling the S3 segment of the CSU pipeline. The 
stall continues until the Ebox reads the GPR. 

* Specifier queue full: For most complex specifiers, the CSU makes a request for Mbox service 
for the memory request required by the specifier. If there are no free entries in the specifier 
queue, the S3 segment of the CSU pipeline stalls until a free entry becomes available. 

* RLOG full: Auto-increment, auto-decrement, and auto-increment deferred specifiers require 
a free RLOG entry in which to log the change to the GPR. If there are no free RLOG entries 
when such a specifier is decoded, the S3 segment of the CSU pipeline stalls until a free entry 
becomes available. 

Ebox: 

* Memory read data not valid: In some instances, the Ebox may make an explicit read request 
to the Mbox to return data in one of the 6 Ebox working registers in the register file. "When 
the request is made, the valid bit on the register is cleared. When the data is written to the 
register, the valid bit is set. If the Ebox references the working register when the valid bit is 
clear, the S3 segment of the Ebox pipeline stalls until the entry becomes valid. 

* Field queue not valid: For each macroinstruction that includes a field-type specifier, the 
microcode microbranches on the first entry in the field queue to determine whether the field 
specifier addresses a GPR or memory. If the field queue is empty (indicating that the Ibox 
has not yet parsed the field specifier), the result of the next address calculation repeats the 
microbranch the next cycle. Although this is not a true stall, the effects are the same in that 
a microinstruction is repeated until the field queue becomes valid. 

* Outstanding Fbox GPR write: Because the Fbox computation pipeline is multiple cycles long, 
the Ebox may start to process subsequent instructions before the Fbox completes the first. 
If the Fbox instruction result is destined for a GPR that is referenced by a subsequent Ebox 
microword, the S3 segment of the Ebox pipeline stalls until the Fbox GPR write occurs. 

* Fbox instruction queue full: When an instruction is issued to the Fbox, an entry is added to 
the Fbox instruction queue. If there are no free entries in the queue, the S3 segment of the 
Ebox pipeline stalls until a free entry becomes available. 

Ebox/Fbox: 

* Source queue empty: Most instruction operands are prefetched by the Ibox, which writes 
a pointer to the operand value into the source queue. The Ebox then references up to two 
operands per cycle indirectly through the source queue for delivery to the Ebox or Fbox. If 
either of the source queue entries referenced is not valid, the S3 segment of the Ebox pipeline 
stalls until the entry becomes valid. 
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• Memory operand not valid: Memory operands are prefetched by the Ibox, and the data is 
written by the either the Mbox or Ibox into the memory data registers in the register file. If 
a referenced source queue entry points to a memory data register which is not valid, the S3 
segment of the Ebox pipeline stalls until the entry becomes valid. 

5.3.2.5 S4 Stalls 

Stalls that occur in the S4 segment of the pipeline are as follows: 
Ebox: 

• Branch queue empty: When a conditional or unconditional branch is decoded by the Ibox, an 
entry is added to the branch queue. For conditional branch instructions, the entry indicates 
the Ibox prediction of the branch direction. The branch queue is referenced by the Ebox to 
verify that the branch displacement was valid, and to compare the actual branch direction 
with the prediction. If the branch queue entry has not yet been made by the Ibox, the S4 
segment of the Ebox pipeline stalls until the entry is made. 

• Fbox GPR operand scoreboard full: The Ebox implements a register scoreboard to prevent 
the Ebox from reading a GPR to which there is an outstanding write by the Fbox. For each 
Fbox instruction which will write a GPR result, the Ebox adds an entry to the Fbox GPR 
scoreboard. If the scoreboard is full when the Ebox attempts to add an entry, the S4 segment 
of the Ebox pipeline stalls until a free entry becomes available. 

Fbox: 

• Fbox operand not valid: Instructions are issued to the Fbox when the opcode is removed 
from the instruction queue by the microsequencer. Operands for the instruction may not 
arrive until some time later. If the Fbox attempts to start the instruction execution when the 
operands are not yet valid, the Fbox pipeline stalls until the operands become valid. 

Ebox/Fbox: 

• Destination queue empty: Destination specifiers for instructions are processed by the Ibox, 
which writes a pointer to the destination (either GPR or memory) into the destination queue. 
The destination queue is referenced in two cases: when the Ebox or Fbox store instruction 
results via the RMUX, and when the Ebox tries to add the destination of Fbox instructions to 
the Ebox GPR scoreboard. If the destination queue entry is not valid (as would be the case if 
the Ibox has not completed processing the destination specifier), a stall occurs until the entry 
becomes valid. 

• PA queue empty: For memory destination specifiers, the Ibox sends the virtual address of the 
destination to the Mbox, which translates it and adds the physical address to the PA queue. 
If the destination queue indicates that an instruction result is in memory, a store request is 
made to the Mbox which supplies the data for the result. The Mbox matches the data with 
the first address in the PA queue and performs the write. If the PA queue is not valid when 
the Ebox or Fbox has a memory result ready, the RMUX stalls until the entry becomes valid. 
As a result, the source of the RMUX input (Ebox or Fbox) also stalls. 

• EMJLATCH full: All implicit and explicit memory requests made by the Ebox or Fbox pass 
through the EMJLATCH to the Mbox. If the Mbox is still processing the previous request 
when a new request is made, the RMUX stalls until the previous request is completed. As a 
result, the source of the RMUX input (Ebox or Fbox) also stalls. 
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• RMUX selected to other source: Macroinstructions must be completed in the order in which 
they appear in the instruction stream. The Ebox retire queue determines whether the next 
instruction to complete comes from the Ebox or the Fbox. If the next instruction should come 
from one source and the other makes an RMUX request, the other source stalls until the 
retire queue indicates that the next instruction should come from that source. 

5.3.3 Exception Handling 

A pipeline exception occurs when a segment of the pipeline detects an event which requires that 
the normal flow of the pipeline be stopped in favor of another flow. There are two fundamental 
types of pipeline exceptions: those that resume the original pipeline flow once the exception is 
corrected, and those that require the intervention of the operating system. A TB miss on a 
memory reference is an example of the first type, and an access control violation is an example 
of the second type. M=0 faults are handled specially, as described below. 

Restartable exceptions are handled entirely within the confines of the section that detected the 
event. Other exceptions must be reported to the Ebox for processing. Because the NVAX CPU is 
macropipelined, exceptions can be detected by sections of the pipeline long before the instruction 
which caused the exception is actually executed by the Ebox or Fbox. However, the reporting of 
the exception is deferred until the instruction is executed by the Ebox or Fbox. At that point, an 
Ebox handler is invoked to process the event. 

Because the Ebox and Fbox are micropipelined, the point at which an exception handler is 
invoked must be carefully controlled. For example, three macroinstructions may be in execution in 
segments S3, S4, and S5 of the Ebox pipeline. If an exception is reported for the macroinstruction 
in the S3 segment, the two macroinstructions that are in the S4 and S5 segments must be allowed 
to complete before the exception handler is invoked. 

Ib accomplish this, the S4/S5 boundary in the Ebox is denned to be the commit point for a 
microinstruction. Architectural state is not modified before the S5 segment of the pipeline, unless 
there is some mechanism for restoring the original state if an exception is detected (the Ibox RLOG 
is an example of such a mechanism). Exception reporting is deferred until the microinstruction 
to which the event belongs attempts to cross the S4/S5 boundary. At that point, the exception 
is reported and an exception handler is invoked. By deferring exception reporting to this point, 
the previous microinstruction (which may belong to the previous macroinstruction) is allowed to 
complete. 

Most exceptions are reported by requesting a microtrap from the Microsequencer. When the 
Microsequencer receives a microtrap request, it causes the Ebox to break all its stalls, aborts 
the Ebox pipeline (by asserting E_USQ%PE_ABORT_L), and injects the address of a handler for 
the event into the control store address latch. This starts an Ebox microcode routine which will 
process the exception as appropriate. Certain other kinds of exceptions are reported by simply 
injecting the appropriate handler address into the control store at the appropriate point. 

The VAX architecture categorizes exceptions into two types: faults and traps. For both types, the 
microcode handler for the exception causes the Ibox to back out all GPR modifications that are 
in the RLOG, and retrieves the PC from the PC queue. For faults, the PC returned is the PC of 
the opcode of the instruction which caused the exception. For traps, the PC returned is the PC 
of the opcode of the next instruction to execute. The microcode then constructs the appropriate 
exception frame on the stack, and dispatches to the operating system through the appropriate 
SCB vector. 
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There are a number of exceptions detected by the NVAX CPU pipeline, each of which is discussed 
briefly below, and in much more detail in the appropriate chapter of this specification. 

5.3.3.1 Interrupts 

The CPU services interrupt requests from various sources between macroinstructions, and at 
selected points within the string instructions. Interrupt requests are received by the interrupt 
section and compared with the current IPL in the PSL. If the interrupt request is for an IPL 
that is higher than the current value in the PSL, a request is posted to the microsequencer. At 
the next macroinstruction boundary, the microsequencer substitutes the address of the microcode 
interrupt service routine for the instruction execution flow. 

The microcode handler then determines if there is actually an interrupt pending. If there is, it 
is dispatched to the operating system through the appropriate SCB vector. 

5.3.3.2 Integer Arithmetic Exceptions 

There are three integer arithmetic exceptions detected by the CPU, all of which are categorized 
as traps by the "VAX architecture. This is significant because the event is not reported until after 
the commit point of the instruction, which allows that instruction to complete. 

Integer Overflow Trap 

An integer overflow is detected by the RMUX at the end of the S4 segment of the Ebox 
pipeline. If PSL<IV> is set and overflow traps are enabled by the microcode, the event is 
reported in segment S5 of the pipeline via a microtrap request. 

Integer Divide-By-Zero Trap 

An integer divide-by-zero is detected by the Ebox microcode routine for the instruction. It 
is reported by explicitly retiring the instruction and then jumping directly to the microcode 
handler for the event. 

Subscript Range Trap 

A subscript range trap is detected by the Ebox microcode routine for the INDEX instruction. 
It is reported by explicitly retiring the instruction and then jumping directly to the microcode 
handler for the event. 



5.3.3.3 Floating Point Arithmetic Exceptions 

All floating point arithmetic exceptions are detected by the Fbox pipeline during the execution of 
the instruction. The event is reported by the RMUX when it selects the Fbox as the source of the 
next instruction to process. At that point, a microtrap is requested. 
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5.3.3.4 Memory Management Exceptions 

Memory management exceptions are detected by the Mbox when it processes a virtual read or 
write. This section covers actual memory management exceptions such as access control violation, 
translation not valid, and M=0 faults. Translation buffer misses are discussed separately in the 
next section. Because the reporting of memory management exceptions is specific to the operation 
that caused the exception, each case is discussed separately. 

• I-Stream Faults 

While the Ibox is decoding instructions, it may access a page which is not accessible due 
to a memory management exception. This may occur on the opcode, a specifier or specifier 
extension, or on a branch displacement. Should this occur, the Ibox sets a global MME 
fault flag and stops. Memory management exceptions detected on intermediate operations 
during specifier evaluation (such as a read for the indirect address of a displacement deferred 
specifier) are converted by the Ibox into source or destination faults, as described below. 

If the Ebox reaches the instruction which caused the exception (which may not happen due to, 
for example, interrupt, exception, or branch), it will reference one of the queues, which does 
not have a valid entry because the Ibox stopped when the error was detected. The particular 
queue depends on the instruction component on which the error was detected. If the Ibox 
global MME flag is set when an empty queue entry is referenced, the error is reported in one 
of four ways. 

If the Ibox global MME flag is set when the micro-sequencer references an invalid instruction 
queue entry, it inserts the instruction queue stall into the pipeline and the Ebox qualifies it 
with the fault flag. When this nag reaches the S4 segment of the pipeline and is selected by 
the RMUX, a microtrap is requested. 

If the Ibox global MME flag is set when the Ebox references an invalid source queue entry, 
a fault flag is injected into either the Ebox or Fbox pipelines, depending on the type of 
instruction. To avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in 
the pipeline. When the flag reaches the S4 segment of the pipeline and is selected by the 
RMUX, a microtrap is requested. 

If the Ibox global MME flag is set when the Ebox microcode microbranches on an invalid field 
queue entry, a fault flag is injected into the Ebox pipeline. When the flag reaches the S4 
segment of the pipeline and is selected by the RMUX, a microtrap is requested. 

If the Ibox global MME flag is set when the Ebox references an invalid branch queue entry, 
and the RMUX selects the Ebox, a microtrap is requested. 

If the Ibox global MME flag is set when the RMUX references an invalid destination queue 
entry for a store request, a microtrap is requested. 

• Source Operand Faults 

If the Mbox detects a memory management exception during the translation for a source 
specifier, it qualifies the data returned to the MD file with a fault flag which is written into 
the MD file. When this entry is referenced by the Ebox, a fault flag is injected into the 
pipeline. Tb avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the 
pipeline. When the flag reaches the S4 segment of the pipeline and is selected by the RMUX, 
a microtrap is requested. 
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• Destination Address Faults 

If the Mbox detects a memory management exception during the translation for a destination 
specifier, it sets a fault flag in the PA queue entry for the address. When this entry is 
referenced by the RMUX, a microtrap is requested,. 

• Faults on Explicit Ebox Memory Requests 

Explicit Ebox reads and writes are, by definition, performed in the context of the instruction 
which the Ebox is currently executing. If the Mbox detects a memory management exception 
that was the result of an explicit Ebox read or write, it requests an immediate microtrap to 
the memory management fault handler. 

• M=0 faults 

M=0 faults occur when the Mbox finds the M-bit clear in the PTE which is used to translate 
write- type references. The event is reported to the Ebox in one of the three ways described 
above: via the MD file or PA queue fault flags, or via an immediate microtrap for explicit 
Ebox writes. 

Unlike other memory management exceptions, which are dispatched to the operating system, 
M=0 faults are completely processed by the Ebox microcode handler. For normal instructions, 
the handler causes the Ibox to back out all GPR modifications that are in the RLOG and 
retrieves the PC from the PC queue. For string instructions, any RLOG entries that belong 
to the string instructions are not processed, and PSL<FPD> is set. Using the PTE address 
supplied by the Mbox, the Ebox microcode reads the PTE, sets the M-bit, and writes the 
PTE back to memory. The instruction stream is then restarted at the interrupted instruction 
(which may result in special FPD handling, as described below). 



5.3.3.5 Translation Buffer Miss 

Translation buffer misses are handled by the Mbox transparently to the rest of the CPU. When 
a reference misses in the translation buffer, the Mbox aborts the current reference and invokes 
the services of the memory management exception sequencer in the Mbox, which fetches the 
appropriate PTE from memory and loads it into the translation buffer. The original reference is 
then restarted. 

5.3.3.6 Reserved Addressing Mode Faults 

Reserved addressing mode faults are detected by the Ibox for certain illegal combinations of 
specifier addressing modes and registers. When one of these combinations is detected, the Ibox 
sets a global addressing mode fault flag that indicates that the condition was detected and stops. 

If the Ibox global addressing mode fault flag is set when the Ebox references an invalid source 
queue entry, a fault flag is injected into either the Ebox or Fbox pipelines, depending on the type 
of instruction. Tb avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the 
pipeline. The fault flag is carried along the Ebox or Fbox pipeline and passed to the RMUX, 
which reports the event by requesting a microtrap when that source is selected. 
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If the Ibox global addressing mode fault nag is set when the Ebox microcode microbranches on 
an invalid field queue entry, a fault flag is injected into the Ebox pipeline. When the nag reaches 
the S4 segment of the pipeline and is selected by the RMUX, a microtrap is requested. 

Similarly if the Ibox global addressing mode fault nag is set when the RMUX, in response to 
a request by the Ebox or Fbox, references an invalid destination queue entry, a microtrap is 
requested. 

5.3.3.7 Reserved Operand Faults 

Reserved operand faults for floating point operands are detected by the Fbox, and reported in the 
same manner as the floating point arithmetic exceptions described above. 

Other reserved operand faults are detected by Ebox microcode as part of macroinstruction 
execution flows and are reported by jumping directly to the fault handler. 

5.3.3.8 Exceptions Occurring as the Consequence of an Instruction 

Opcode-specific exceptions such as reserved instruction faults, breakpoint faults, etc., are 
dispatched directly to handlers by placing the address of the handler in the instruction PLA 
for each instruction. 

Other instruction-related faults, such as privileged instruction faults, are detected in execution 
flows by the Ebox microcode and are reported by jumping directly to the fault handler. 

For testability, the Fbox may be disabled. If this is the case, integer multiply instructions 
are executed by the Ebox microcode and floating point instructions are converted into reserved 
instruction faults for emulation by software. When the first Ebox microinstruction of an Fbox 
operand flow for a floating point macroinstruction reaches the S4 segment of the pipeline, a 
microtrap is requested. The handler for this microtrap then jumps directly to the reserved 
instruction fault handler. 

5.3.3.9 Trace Fault 

Trace faults are detected by the microsequencer with some help from the Ebox. The 
microsequencer maintains a duplicate copy of PSL<TP>, which it updates as required to track 
the state of the PSL copy as it would exist when the instruction is executed by the Ebox. At 
the end of a macroinstruction, the microsequencer logically ORs its local copy of the TP bit with 
PSL<TP>. If either is set, the microsequencer substitutes the address of the microcode trace fault 
handler for the address of the next macroinstruction. 

5.3.3.10 Conditional Branch Mispredict 

When the Ibox decodes a conditional branch, it predicts the path that the branch will take and 
places its prediction into the branch queue. When the Ebox reaches the instruction, it evaluates 
the actual path that the branch took and compares it in the S5 segment of the Ebox pipeline with 
the Ibox prediction. If the two are different, the Ibox is notified that the branch was mispredicted 
and a microtrap request is made to abort the Ebox and Fbox pipelines. The Ibox flushes itself, 
backs out any GPR modifications that are in the RLOG, and redirects the instruction stream to 
the alternate path. The Ebox microcode handler for this event cleans up certain machine state 
and waits for the first instruction from the alternate path. 
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5.3.3.11 First Part Done Handling 

During the execution of one of the 8 string instructions that are implemented by the CPU, an 
exception or an interrupt may be detected. In that event, the Ebox microcode saves all state 
necessary to resume the instruction in the GPRs, backs up PC to point to the opcode of the string 
instruction, sets PSL<FPD> in the saved PSL, and dispatches to the handler for the interrupt or 
exception. 

When the interrupt or exception is resolved, the software handler terminates with an REI back to 
the instruction. When the Ibox decodes an instruction with PSL<FPD> set, it stops parsing the 
instruction immediately after the opcode. In particular, it does not parse the specifiers. When the 
microsequencer finds PSL<FPD> set at a macroinstruction boundary, it substitutes the address 
of a special FPD handler for the instruction execution flow. 

The FPD handler determines which instruction is being resumed from the opcode, unpacks the 
state saved in the GPRs, clears PSL<FPD>, advances PC to the end of the string instruction (by 
adding the opcode PC to the length of the instruction, which was part of the saved state), and 
jumps back to the middle of the interrupted instruction. 

5.3.3.12 Cache and Memory Hardware Errors 

Cache and memory hardware errors are detected by the Mbox or Cbox, depending on the type 
of error. If the error is recoverable (e.g., a Pcache tag parity error on a write simply disables 
the Pcache), it is reported via a soft error interrupt request and is dispatched to the operating 
system. 

In some instances, write errors that are not recoverable by hardware are reported via a hard 
error interrupt request, which results in the invocation of the operating system. 

Read errors that are not recoverable by hardware are reported via the assertion of a soft error 
interrupt, and also in a manner that is similar to that used for memory management exceptions, 
as described above. In fact, the MD file, PA queue, and the Ibox all contain a hardware error flag 
in parallel with the memory management fault flag. With the exception of TB parity errors, which 
cause an immediate microtrap request, the event is reported to the Ebox in exactly the same way 
as the equivalent memory management exception would be, but the microcode exception handler 
is different. For example, an unrecoverable error on a specifier read would set the hardware error 
flag in the MD file. When the flag is referenced, the error flag is injected into the pipeline. When 
the flag advances to the S4 segment and is selected by the RMUX, it causes a microtrap request 
which invokes a hardware error handler rather than a memory management handler. 

Note that certain other errors are reported in the same way. For example, if the memory 
management sequencer in the Mbox receives an unrecoverable error trying to read a PTE 
necessary to translate a destination specifier, it sets the hardware error flag in the PA queue 
for the entry corresponding to the specifier. This results in a microtrap to the hardware error 
handler when the entry is referenced. PTE read errors for read references are also reported via 
the original reference. 
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5.4 Revision History 



Table 5-1 : 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


19-Dec-1989 


Update for second-pass release. 


Mike Uhler 


02-Feb-1991 


Update after pass 1 PG. 
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Chapter 6 

Microinstruction Formats 



6.1 Ebox Microcode 

The NVAX microword consists of 61 bits divided into two major sections. Bits <60:15> control 
the Ebox Data Path and are encoded into two formats. Bits <14:0> control the Microsequencer 
and are also encoded into two formats. 



6.1.1 Data Path Control 

The Data Path Control Microword specifies all the information needed to control the Ebox Data 
Path. The two formats, Standard and Special, are selected by bit <60>, the FORMAT bit. In 
addition, bit <45>, the LIT bit, selects the constant generation format of the microword, which 
may be either an 8-bit constant or a 10-bit constant, depending on a decode in the MISC field. 
Pictures of the microword formats are in Figure 6—1 and Figure 6—2. A brief description of each 
field is given in Table 6-1 and Table 6-2. 

Figure 6-1 : Ebox Data Path Control, Standard Format 



6|5 5 5 5I5 5 5 5 1 5 5 4 4 1 4 4 4 4 1 4 4 4 4|3 3 3 3|3 3 3 3|3 3 2 2|2 2 2 2 12 2 
0|9 8 7 6 | 5 4 3 2|1 0 9 8|7 6 5 4|3 2 1 0 | 9 8 7 6|5 4 3 2|1 0 9 8|7 6 5 4|3 2 

|0| ALU | MRQ |Q| SHF |0| VAX I B |L|W|V| DST | A 

II IPOS | CONST | MISC not equal CONST. 10 

+-+ H + 

III CONST. 10 | MISC equal CONST. 10 
+_+-———.——.—«+ 



211 1 1 II 
0|9 8 7 6 



I MISC 



Table 6-1 : EBOX Data Path Control Microword Fields, Standard Format 

Microword 

Bit Position Microword Field Format Description 

60 FORMAT — Microword format-Standard or Special 

59:55 ALU Both ALU function select 
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Table 6-1 (Cont.): EBOX Data Path Control Mlcroword Fields, Standard Format 



Bit Position 


Microword Field 


Microword 
Format 


Description 


cla .cn 
d4:ou 




x>otn 


Mbox request select 


AQ 


r\ 

H 


standard. 


Q register load control 




OXXT 




Shifter function select 


A C 


T FT 


±>Otn 


ALU/shifter B port control— register or literal 


A A -Af\ 


VAT 
VAJU 


•standard 


Constant shift amount 


oxJ.6i> 


n 

D 


i>Ottl 


AT TT/j f A -1 _b_l _ j . r i r . -4. 

AJLU/snitter a port select 


A A -AO 




±>otn 


Constant position 


42:35 


CONST 


Both 2 


8-bit constant value 


44:35 


CONST. 10 


Both 8 


10-bit constant value 


34 


L 


Both 


Length control 


33 


W 


Both 


Wbus driver control 


32 


V 


Both 


VA write enable 


31:26 


DST 


Both 


WBUS destination select 


25:20 


A 


Both 


ALU/shifter A port select 


19:15 


MISC 


Both 


Miscellaneous function select, group 0 



1 NOT Constant generation microword variant 

2 8-Bit Constant generation microword variant, when MISC field not equal CONST.10 
8 10-Bit Constant generation microword variant, when MISC field equal CONST.10 



Figure 6-2: Ebox Data Path Control, Special Format 



6|5 5 5 5|5 5 5 5 f 5 5 4 4 1 4 4 4 4|4 4 4 4|3 3 3 3|3 3 3 3|3 3 2 2 1 2 2 2 2 12 2 2 2 1 1 1 1 1|1 
0|9 8 7 € 1 5 4 3 2 1 1 0 9 8|7 6 5 4|3 2 1 0|9 8 7 € 1 5 4 3 211 0 9 8 1 7 6 5 4|3 2 1 0 1 9 8 7 6|5 

111 ALU | MRQ | MISC1 |0| MISC2 ID| B |L|W|V| DST I A I MISC I 

II IPOS | CONST | MISC not equal CONST.10 

III CONST.10 | MISC equal CONST.10 



Table 6-2: EBOX Data Path Control Microword Fields, Special Format 



Bit Position Microword Field 



Microword 

Format Description 



60 



FORMAT 



Microword format-Standard or Special 
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Table 6-2 (Cont.): EBOX Data Path Control Microword Fields, Special Format 



Bit Position 


Microword Field 


XYJLX wxTU *V vrxu. 

Format 


Description 


59:55 


A T TT 
ALU 


Jbotn 


ALU function select 


54:50 


MRQ 


Both 


Mbox request select 


49:46 


MI SCI 


Special 


Miscellaneous function select, group 1 


45 


T TT> 
LAI 


Both 


ALU/shifter B port control— register or literal 


44:41 


MISC2 


Speaal 


Miscellaneous function select, group 2 


40 


T\Trt a Y%T Y!\ T> I .Will I 

DISABLE.RETTRE 


Special 


Instruction retire disable 


39:35 


B 


Both 


ALU/snifter B port select 


44:43 


POS 


Both 


Constant position 


42:35 


CONST 


Both 2 


8-bit constant value 


44:35 


CONST. 10 


Both 8 


10-bit constant value 


34 


L 


Both 


Length control 


33 


W 


Both 


Wbus driver control 


32 


V 


Both 


VA write enable 


31:26 


DST 


Both 


WBUS destination select 


25:20 


A 


Both 


ALU/shifter A port select 


19:15 


MISC 


Both 


Miscellaneous function select, group 0 



1 NOT Constant generation microword variant 

2 8-Bit Constant generation microword variant, when MISC field not equal CONST. 10 
3 10-Bit Constant generation microword variant, when MISC field equal CONST.10 



6.1.2 Microsequencer Control 

The Microsequencer Control Microword supplies the information necessary for the 
Microsequencer to calculate the address of the next microinstruction. The basic computation 
done by the Microsequencer involves selecting a base address from one of several sources, and 
then optionally modifying three bits of the base address to get the final next address. 

Bit <14>, SEQ.FMT, selects between Jump and Branch formats. Figure 6-3 and Figure 6—4 show 
the two formats. Table 6-3 and Table 6—4 describe each of the fields. 



DIGITAL CONFIDENTIAL 



Microinstruction Formats 6-3 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 6-3: Ebox Microsequencer Control, Jump Format 



i i in i i i 

43211 0 9 8|7 6 5 4|32 10 



|0|S|MUX| 



Table 6-3: 


Ebox Microsequencer Control Microword Fields, Jump Format 






Microword 




Bit Position 


Microword Field 


Format 


Description 


14 


SEQ.FMT 




Microsequencer format-Jump or Branch 


13 


SEQ.CALL 


Both 


Subroutine call 


12:11 


SEQ.MUX 


Jump 


Next address select 


10:0 


J 


Jump 


Next address 



Figure 6-4: Ebox Microsequencer Control, Branch Format 



i i in i i i 

432 11 OS 81765413 210 
»-+-+--———+-—————-+ 

11 ISISEQ.CONE | BR. OFF I 

+_ + _n 



Table 6-4: 


Ebox Microsequencer Control Microword Fields, Branch Format 






Microword 




Bit Position 


Microword Field 


Format 


Description 


14 


SEQ.FMT 




Microsequencer formats-Jump or Branch 


13 


SEQ.CALL 


Both 


Subroutine call 


12:8 


SEQ.COND 


Branch 


Microbranch condition select 


7:0 


BR.OFF 


Branch 


. Page offset of next address 



6.2 Ibox CSU Microcode 

The Ibox complex specifier unit is controlled by a 29-bit microword, as shown in Figure 6-5. A 
brief description of each field is given in Table 6-5. 
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Ibox CSU Format 



28127 26 


25 24|23 22 21 20|19 18 17 16115 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 


1 ALU 


DL I A IB I DST | 


MISC | MREQ IMOX | NXT | 




Table 6-5: 


Ibox CSU Microword Fields 




Bit Position 


Microword Field 


Description 


28:26 


ALU 


ALU function select 


25 


DL 


Data length control 


24:22 


A 


ALU A port select 


21:19 


B 


ALU B port select 


18:16 


DST 


Wbus destination 


15:13 


MISC 


Miscellaneous function select 


12:9 


MREQ 


Mbox request select 


8:7 


MUX.CNT 


Next address mux select 


6:0 


NXT 


Next address 



6.3 Ibox Instruction ROM and Control PLAs 

The Ibox instruction decode is controlled by several ROMs and PLAs that are generated from a 
single source file whose format is shown in Figure 6-6. A brief description of each field is given 
in Table 6-6. A more detailed description of the control information as it is actually found in the 
hardware is given in Table 7—12. 
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Figure 6-6: Ibox Instruction ROM Format 



64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 
I EXEC_DISP |VS|ST_SPCQ|DS|B | V|FB| SP_CNT |A_CNT| 

44 43 42 41 40 39 38 37 36 35 34 33 32 

+ + H + + +~+ H + 4 + + + + 

I A1_R£G |A1_DL| A1_AT I ASSIST1 I 

+ + 1 + + 1- + h + n 1 + + + 

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 

4. !-__+__+ 

|E_DL | AT 6 | DX 6 1 AT 5 | DL 5 1 AT 4 I DL 4 1 AT 3 | DL 3 1 AT 2 I DL 2 I AT 1 | DL 1 1 



Table 6-6: Ibox Instruction ROM Fields 

Bit Position Microword Field Description 

64:56 EXEC_DISP Bits <9:1> of the instruction entry point address in the Ebox 

control store 

55 VS Determines whether a Vfield specifier occupies 1 or 2 source 

queue entries 

54:53 ST.SPCQ Determines whether the parser is stopped at the end of the 

instruction, when the next PC queue entry is made, and when 
the parser is restarted 

52 DS Specifies the length (byte or word) of a branch displacement for 

the instruction 

51 B Specifies whether the instruction has a branch displacement 

50 V Not currently used 

49 FB Specifies whether this instruction is implemented in the Fbox 

48:46 SP_CNT Specifies the number of real specifiers for the instruction 

45 A_CNT Specifies whether the instruction has an assist 

44:41 A1_REG Specifies the register to use for instructions with an assist 

40:39 A1JDL Specifies the data length to use for instructions with an assist 

38:36 A1_AT Specifies the access type to use for instructions with an assist 

35:32 ASSIST1 Specifies the type of assist for instructions with an assist 

31:30 EJDL Specifies the initial Ebox data length to be used for the 

instruction 

29:27 AT6 Supplies the encoded access type of the sixth specifier, if any 

26:25 DL6 Supplies the encoded data length of the sixth specifier, if any 

24:22 AT5 Supplies the encoded access type of the fifth specifier, if any 

21:20 DL5 Supplies the encoded data length of the fifth specifier, if any 

19:17 AT4 Supplies the encoded access type of the fourth specifier, if any 

16:15 DL4 Supplies the encoded data length of the fourth specifier, if any 
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Table 6-6 (Cont.): Ibox Instruction ROM Fields 


Bit Position 


Microword Field 


Description 


14:12 


AT3 


Supplies the encoded access type of the third specifier, if any 


11:10 


DL3 


Supplies the encoded data length of the third specifier, if any 


9:7 


AT2 


Supplies the encoded access type of the second specifier, if any 


6:5 


DL2 


Supplies the encoded data length of the second specifier, if any 


4:2 


ATI 


Supplies the encoded access type of the first specifier, if any 


1:0 


DL1 


Supplies the encoded data length of the first specifier, if any 
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6.4 Revision History 



Table 6-7: Revision History 



Who 


When 


Description of change 


Debra Bernstein 


06-Mar-1989 


Release for external review. 


Mike Uhler 


13-Dec-1989 


Update for second-pass release. 


Mike Uhler 


04-Feb-1991 


Update after pass 1 PG. 
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Chapter 7 
The Ibox 



7.1 Overview 
7.1.1 Introduction 

This chapter describes the Ibox section of the NVAX CPU chip. The 4-stage Ibox pipeline (S0..S3) 
runs semi-autonomously to the rest of the NVAX CPU and supports the following functions: 

• Instruction Stream Prefetching 

The Ibox attempts to maintain sufficient instruction stream data to decode the next instruc- 
tion or operand specifier. 

• Instruction Parsing 

The Ibox identifies Hie instruction opcodes and operand specifiers, and extracts the informa- 
tion necessary for further processing. 

• Operand Specifier Processing 

The Ibox processes the operand specifiers, initiates the required memory references, and 
provides the Ebox with the information necessary to access the instruction's operands. 

• Branch Prediction 

Upon identification of a branch opcode, the Ibox hardware predicts the direction of the branch 
(taken vs. not taken). For branch taken predictions, the Ibox redirects the instruction 
prefetching and parsing logic to the branch destination, where instruction processing resumes. 

Figure 7-1 is a top level block diagram of the Ibox showing the major Ibox sub-sections and their 
inter-connections. 

This chapter presents a high-level description of the Ibox functions, then provides details of the 
Ibox sub-sections which support each function. 



DIGITAL CONFIDENTIAL 



The Ibox 7-1 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 7-1 : Ibox Block Diagram 
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7.1.2 Functional Overview 

The Ibox fetches, parses, and processes the instruction stream, attempting to maintain a constant 
supply of parsed VAX instructions available to the Ebox for execution. The pipelined nature of 
the NVAX CPU allows for multiple macroinstructions to reside within the CPU at various stages 
of execution. The Ibox, running semi-autonomously to the Ebox, parses the macroinstructions 
following the instruction that is currently in Ebox execution. Performance gains are realized 
when the time required for instruction parsing in the Ibox is hidden during the Ebox execution of 
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an earlier instruction. The Ibox places the information generated while parsing ahead into Ebox 
queues. 

The Instruction Queue contains instruction specific information which includes the instruction 
opcode, a floating point instruction flag, and an entry point for the Ebox microcode. 

The Source Queue contains information about the source operands for the instructions in the 
instruction queue. Source queue entries contain either the actual operand (as in a short literal), 
or a pointer to the location of the operand. 

The Destination Queue contains information required for the Ebox to select the location for 
execution results storage. The two possible locations are the VAX General Purpose Registers 
(GPRs) and memory. 

These queues allow the Ibox to work in parallel with the Ebox. As the Ebox consumes the entries 
in the queues, the Ibox parses ahead adding more. In the ideal case, the Ibox would stay far 
enough ahead of the Ebox such that the Ebox would never have to stall because of an empty 
queue. 

The Ibox needs access to memory for instruction and operand data. Instruction and operand data 
requests are made through a common port to the Mbox. All data for both the Ibox and the Ebox 
is returned on a shared M%MD_BUS_H<63:0> 

The Ibox port feeds operand data requests to the Mbox Specifier Request Latch and instruction 
data requests to the Mbox Instruction Request Latch. These 2 latches allow the Ibox to issue 
memory requests for both instruction and operand data even though the Mbox may be processing 
other requests. 

The Ibox supports 4 main functions: 

1. Instruction Stream Prefetching 

2. Instruction Parsing 

3. Operand Specifier Processing 

4. Branch Prediction 

Instruction Stream Prefetching works to provides a steady source of instruction stream data for 
instruction parsing. While the instruction parsing logic works on one instruction, the instruction 
prefetching logic fetches several instructions ahead. 

The Instruction Parsing logic parses the incoming instruction stream, identifying and initial pro- 
cessing each of the instruction's components. The instruction opcodes and associated information 
are passed directly into the Ebox instruction queue. Operand specifier information is passed on 
to the operand specifier processing logic. 

The Operand Specifier Processing logic locates the operands in registers, in memory, or in the 
Instruction Stream. This logic places operand information in the Ebox source and destination 
queues, and makes the required operand memory requests. 

The Ibox does not have prior knowledge of branch direction for branches which rely on Ebox 
condition codes. The Branch prediction logic makes a prediction on which way the branch will go 
and forces the Ibox to take that path. This logic saves the alternate branch path target, so that 
in the event that Ebox branch execution shows that the prediction was wrong, the Ibox can be 
redirected to the correct branch direction. 
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7.1.3 The Pipeline 

Hie Ibox logic spans the first 4 segments of the NVAX CPU pipeline (S0..S3). The following table 
lists the major Ibox sub-sections and which pipe segments they occupy. 



Table 7-1: 


Ibox Pipeline 


Sub-Section 


Description 


Name 




SO Pipe Stage 


VIC 


The Virtual Instruction Cache is a 2KB direct mapped I stream-only cache with 32 byte 




blocks, a valid bit per quad word, and an access size of 8 bytes. 


PFQ 


The Prefetch Queue is a queue of instruction stream data supplied by the vie. It is 4 bytes 




wide by 4 elements deep. 


SI Pipe Stage 


IBU 


The Instruction Burst Unit breaks up the incoming instruction data into opcodes, operand 




specifiers, specifier extensions, and branch displacements and passes the results to other 




parts of the Ibox for further processing. 


nu 


The Instruction Issue Unit takes the opcodes provided by the ibu and generates an Ebox 




microcode dispatch addresses and other context for instruction execution. 


BPU 


The Branch Prediction Unit predicts whether or not branches will be taken and redirects 




the Ibox instruction processing as necessary. 


OQU 


The Operand Queue Unit is the interface to the Ebox source and destination queues. 


SBU 


The Scoreboard Unit tracks outstanding read and write references to the GPRs. 


CSU (SI) 


This segment of the Complex Specifier Unit contains the microsequencer and control store. 


S2 Pipe Stage 


CSU (S2) 


This is the register READ segment of the complex specifier unit. It accesses the necessary 




registers and provides the data to the ALU in the next pipe stage. 


S3 Pipe Stage 


CSU (S3) 


This is the ALU and WRITE segment of the complex specifier unit. This segment performs 




the necessary ALU operations and writes the results either to the Ebox register file or to 




local temporary registers. This segment also contains the Mbox interface. 



Pipe segment SO is dedicated to supplying a steady stream of instruction data for use by the IBU. 
When prefetching is enabled, the VIC attempts to nil the PFQ with up to 8 bytes of instruction 
stream data. 

The IBU parses in SI, the Ebox receives information about the instruction and its operands in the 
instruction, source, and destination queues. The IIU is the Ibox interface to the Ebox instruction 
queue, and the OQU is the interface to the source and destination queues. When the IBU has 
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identified a new opcode, this opcode is passed to the UU which places the necessary opcode- 
specific information in the Ebox instruction queue. When operand specifiers are identified, the 
OQU places the necessary operand specific information in the source and destination queues. 

The CSU is a 3 stage (SI.. S3) microcoded pipeline dedicated to handling operand specifiers which 
require complex processing and/or access to memory. It has read and write access to the Ebox 
register file and a port to the Mbox, Memory requests from the VIC are received at the CSU and 
forwarded to the Mbox when there is a cycle free of specifier memory requests. 

7.2 Instruction Stream Prefetching 

The Instruction Stream Prefetching mechanism provides a buffer of Istream data 4 bytes wide and 
4 elements deep for use by the instruction parser. This buffer insulates the instruction parser from 
the bursty behavior of the cache and memory sub-systems, and allows for the parallel operation 
of the instruction fetching and instruction parsing functions. 

The two Ibox sub-sections which support the instruction prefetching function are the "Virtual 
Instruction Cache (VIC) and the Prefetch Queue (PPQ) both of which reside in the SO pipe stage. 



The VIC is a 2KB, direct-mapped, Istream cache which acts as the primary source of instruction 
stream data for the Ibox. The VIC attributes are summarized in Table 7-2. 



7.2.1 The VIC 



Table 7-2: VIC Attributes 



Access Type 
Block Size 



Cache size 



2K Bytes 
Direct Mapped 



Sub-block Size 



32 Bytes 
8 Bytes 



Valid Bits 



4 Valid bits/Cache Block s 1 Per Sub-block 

4 Even Parity bits/Cache Block s 1 Per Sub-block 

64 Tags 

1 Even Parity Bit Per Tag 
Fill Forward 



Data Parity Bits 
# of Tags 
Tag Parity Bit 
Fill Algorithm 
Access Size 



8 Bytes 
8 Bytes 
NONE 



Bus Size 



Prefetching 
Data stored 
Virtual/Physical 



Istream Only 
Virtual 
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Figure 7-2: VIC Block Diagram 




The VIC is a virtual cache because the addresses that are used to index into the cache are un- 
translated VAX Virtual addresses. See Section 12.5 for more on VAX Memory Management 
and Address Translation. The VIC maintains a local prefetch pointer called VIBA<31:3> (Virtual 
Instruction Buffer Address). This address is quadword aligned and always points to the next 
quadword of Istream data to be sent to the PFQ. Table 7—3 shows the fields in VDBAo. 
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Table 7-3: VIBA bit fields 



Bitfield 


Field name 


Description 


<4:3> 


SUBBLEJDTOEX 


Sub-block index (or column select) bits indicate which sub-block to select from 
cache block. 


<10:5> 


ROW_INDKX 


Row select bits determine which cache row to access 


<31:11> 


VIBA_TAG 


Bits to be compared against cache tag 



Whenever the BPU issues a new PC, the VIC latches the NEW_PC<31:3> in VIBA<31:3>. VIBA<10:5> 
are used to select which cache row to access. Each cache row, shown in Figure 7—3, stores a 21- 
bit tag with even parity for the tag, and four quadword sub-blocks each with a valid bit and an 
even parity bit which covers the data only. When a cache row of the VIC is accessed, The 21-bit 
tag is compared with VTBA<31:11> to determine cache hit or miss. VIBA<4:3> selects the cache 
sub-block. 

Figure 7-3: VIC Cache Row Format 



6 6 6 6 

03 03 03 03 0 

|ViP| Sub-block 3 data |V|P| Sub-block 2 data |Vip| Sub-block 1 data ]V|P| Sub-block 0 data | 

/ 

287 bits 



Whenever space exists in the PPQ, the VIC attempts to supply the next quadword of instruction 
stream data by doing a VICJREAD using the current value of VIBA<31:3>. If the VICJREAD results 
in a miss, the VIC begins a VTC_FTLL sequence by sending a request through the CSU for a cache 
nil operation from the Mbox. 

7.2.1.1 VIC Control 

The VIC control evaluates the status flags summarized in Table 7—4 every cycle to determine 
the proper type of cache sequence for the next cycle. VIC_ENABLE enables the cache itself, 
specifically VTC_READs and VTCLWRTTEs. PREFETCH_ENABLE is the enable bit for the Istream 
prefetch sequencer. VIC_EKBOR indicates that there was a VIC parity error. MBOXJSRROR 
indicates that error status was reported by the Mbox. WRITE_PEND1NG indicates that the Mbox 
drove valid Istream data on the M%MD_BUS_H<63:0> last cycle, and a cache write cycle should 
begin next. The MISSJPENDING flag is set when a VIC_READ misses in cache, and remains set 
until the cache fill sequence terminates. LOAD_VIC_DAIA indicates that VIC data is ready for the 
PPQ. LOAD_MD_DATA indicates that the data on the M%MD_BUS_H<63:0> during a VIC fill should 
be loaded into the PPQ. 
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Table 7-4: 


VIC Status Hags 


VIC Flag 


Meaning 


VICJENABLJB 


The vie enable bit 


prefetchjbnable The prefetch enable bit 


VIC.ERBOB 


There was a parity error in the vie 


MBOX.ERROB 


There was an error in the Mbox fetching Istream data 


WRITE_PKNDING 


Valid data latched from m%md_bus_h<63 : o> , ready to be written to the vie 


MISS_PKNDING 


A vie cache fill from the Mbox is in progress 


VIC_RKAD 


A cache read from the vie is in progress 



7.2.1.2 VIC_Reads 

The VIC starts a VTC.READ sequence when PREFETCH_ENABLE is set and WRITE_PENDING is clear. 
If VIC_ENABLE is set, the VIC_READ sequence accesses the cache using the address in VIBA<31:3>. 
The decode of VERA<10:5> selects one of 64 cache rows. If TAG<20:0> matches VIBA<31:11> and 
the valid (V) bit for the sub-block selected by VEBA<4:3> is set, then there is a cache hit. The data 
from the sub-block selected by VIBA<4:3> is driven onto VIC.DATA W BUS<63:0>, LOAD_VIC_DATA is 
asserted if the PFQ is not full, and the data is loaded into the PFQ. 

If VIBA<31:11> does not match TAG<20:0>, or the tag matches but the V bit for the selected 
sub-block is not set, then a cache miss has occurred. In this case, VIBA<31:3> is saved in 
MISS_ADDRESS<31:3> and the MISS_PENDING nag is set. The four data parity bits for the accessed 
cache block are latched in MISS_PAEITY<3:0>. The four valid bits for the same cache block are 
latched in MISS_VALID<3:0> if the cache miss is caused by a clear sub-block valid bit. If the cache 
miss is caused by a tag miscompare then MESS_VALED<3:0> is cleared. VIC_WRITEs make use of 
MISS_ADDRESS<31:3>, MISS_PARITY<3:0>, and MISS_VALTD<3:0>. A cache nil operation begins as 
described in Section 7.2.1.3. 

If VIC_ENABLE is clear or the LOCK bit in the ICSR register is set, indicating a VIC parity error 
has occurred, then all VIC_READs are forced to miss. 

7.2.1.3 VICRHs 

Upon detection of a cache miss during a VIC.READ, the VIC issues a fill request to the CSV. The 
miss address, stored in MISS_ADDRESS<31:3>, is driven onto VIC_BEQ_ADDR<31:3> and VIC_REQ 
is asserted. The CSU forwards the VTCJEtEQ to the Mbox during the next free cycle on the 
I%IBOX w _ADDR_H<> bus and associated control lines. 

The Mbox returns quadwords of instruction data starting with the requested quadword and 
continuing to the end of the block. This cache fill algorithm is called fill forward. If the Mbox 
goes off-chip to get the requested data, then a full cache block of instruction data is returned, 
but not necessarily in any particular order. If the Mbox processes the fill request and finds that 
the request resides in I/O space, the request is also sent off-chip. In this case only the single 
requested quadword of data returns to the VIC. In all cases, the VIC is unaware of the number of 
data blocks being returned. When the last block of data is being returned by either the Cbox or 
Mbox, a M%LAST_FILL_H is signaled allowing MISS_PENDING to be cleared and a new read begun. 
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7.2.1.4 VIC Writes 

The assertion of M%VIC_DAIA_L indicates the presence of Istream data on M%MDJBUS_H<63:0>. 
The VIC latches 

M%MD_BUS_H<63 :0> in FILL_DAIA<63:0>, M%MD_BUS_QW_PARTTY_L<0> in FrLL_DATA w PARITy<0>, 
M%QW_ALIGNMENT_H<1:0> in MISS_ADDRESS<4:3>, and sets WKETE_PENDING. 

If VIC_ENABLE is set, then a VIC_WRITE commences the next cycle using the address stored in 
MLSS_ADDRESS<31:3> and the data stored in FILL_DAIA<63:0>. MISS_ADDRESS<10:5> selects 
the cache row to write and MB3S_ADDRESS<4:3> selects the sub-block to write. TAG<20:0> 
and its parity bit for the selected row are written with MISS_ADDRESS<31:11> and the even 
parity calculated for these bits. The selected sub-block is written with FILL_DAIA<63:0>. 
MISS_PARITY<3:0> and MLSS_VALID<3:0> contain the four data parity bits and four valid 
bits for the cache block being filled. The parity bit in MISS_PAKnr<3:0> indexed by 
MISS_ADDRESS<4:3> is associated with the sub-block being written. This parity bit is written with 
MTcMD.BUS.Q W_PABITY_L<0 > . The valid bit in MISS_VAIID<3:0> indexed by MISS_ADDRESS<4:3> 
is associated with the sub-block being written. This valid bit is set. Both MISS_PARITY<3:0> and 
MISS_VALID<3:0> are written into the cache array. 

There may be up to four VIC.WRITEs for each VICJFTLL depending upon sub-block alignment and 
fill sequence. However, the cache block tag and tag parity, all four data parity bits, all four data 
valid bits, and one sub-block of data are all written with every VIC_WRrrE. 

If VTC_ENABLE is clear, VIC.WEITEs are disabled, but the cache fill sequence completes normally: 

See section Section 7.2.1.7 for information on M%HARD_ERR_H and M<«MME_FAULT_H. 

7.2.1.5 VIC Bypass 

When fill data arrives at the VIC on the M%MD_BUS_H<63:0>, an evaluation is done to determine 
if the incoming data should be loaded directly into the PFQ. If so, then the PFQ latches the data 
directly from the M%MD_BUS_H<63:0> and VBBA is incremented by 8. This action is referred to as a 
VIC bypass and is signaled to the PFQ by LOAD_MD_DAIA_ Note that a VIC_WRITE occurs regardless 
of the outcome of the evaluation and whether or not the VIC bypass is enabled. If PFQ_FULL from 
the PFQ is asserted, indicating the PFQ is full, then LOAD_MD_DATA is not asserted and VERA is 
not incremented. 

The evaluation consists of checking to make sure that the incoming data is for the same 
cache block and sub-block to which VDBA points. The only time VTBA can be pointing to a 
different block than the block for which data is returning, is if a previous VIC bypass or 
Hit-Under-Miss incremented VTBA across a cache block boundary. This circumstance is indicated 
by a VIBA_NEW_BLOCK flag. 

In order to facilitate VIC bypass, the Mbox returns M%QW_ALIGENMENT_H<1:0> with each piece of 
fill data. These two bits represent the quadword index for this data within the hexaword cache 
block. If VIBA_NEW_BLOCK is clear and M%QW_AUGNMENT_H<1:0> match VIBA<4:3> then the 
incoming data can be loaded into the PFQ. When VlBA_NEW_BLOCK is set, indicating that the data 
the PFQ is waiting for is not in the block being filled by the Mbox, then VIC bypass is blocked and 
LOAD_MD_DATA is not asserted. 
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7.2.1 .6 VIC Hits Under Miss 

If the last VICJWRITE was also a VIC bypass condition, then VTBA increments and potentially points 
to valid data in the current or next cache block. A subsequent VICJREAD is permitted even when 
MISS_PENDING is still set. This is referred to as a VIC Hit-Under-Miss . If the VIC_READ during 
MISS.PENDING also misses, no cache fill request is started. MISS_ADDRESS, MISS_PARITY<3:0>, 
and MISS_VALID< 3 :0 > are not updated on a second miss. Note that VICJREADS may start and stop 
during a fill sequence based on VIC.WRTFEs, but they always restart at the termination of a fill 
sequence when M%LAST_FBLL_H is signaled. 

7.2.1 .7 VIC Exceptions and Errors 

The VIC interprets the Mbox exception and error signals during the VIC_WRITE sequence. The 
M9cMME_FAULT_H signal indicates that the Mbox encountered a memory management exception 
during the processing of an instruction stream reference. The Mbox produces the M%HARD_ERR_H 
signal when a hardware error is detected during the processing of an instruction stream reference. 
When M%VIC_DAXAJL indicates the presence of data from the Mbox on the M<*MD_BUSJH<63:0>, the 
assertion of either M9cMME_FAULT_H or M%HARD_ERR_H blocks the setting of the WRITE_PENDING 
flag. M9cMME_FAULT_H and M%HARD_ERR_H set the error flags IMMGT_EXC and MHARD_ERR. 
respective^. These flags are sent directly to the IBU. They are also used to disable prefetching 
and block VIC bypass until they are cleared either by a F/7cSTOP_IBOX_H from the Ebox or a 
LOAD_NEW_PC from the BPU. They are also cleared by E%IBOX_LOAD_PC_L which indicates an 
impending LOAD_NEW_PC . 

The VIC checks tag and data during VEC_READs. Parity is calculated for the data sub-blocks 
selected by VIBA<4:3>. The even parity value for the quadword of data is then compared to 
the parity (P) bit associated with the sub-block read from cache. Data parity miscompares are 
reported as parity errors only on valid data. The even parity value for VKLTAG<20:0> is calculated 
on ViC.READs and compared to the parity (P) bit from the array that is associated with the tag 
read. Tag parity miscompares are always reported as parity errors. When the VIC detects either 
parity error, it clears PREFETCH_ENABLE, disabling VIC prefetching, and sets the LOCK bit in the 
ICSR register, preventing further cache reads and writes. The VIC asserts IHARDJERR to forward 
the error condition to the IBU. IHARD.ERR remains asserted until it is cleared by a E%STOP_IBOX_H. 
The error status bits are set appropriately in the ICSR IPR register and the address of the error 
is latched in the VMAR register, as explained in Section 7.2.1.16. In addition, the VIC requests a 
system soft error interrupt by asserting the I%TBOX_S_ERR_L. 

VIC tag and data parity checking are done specifically to protect the data in the VIC arrays. 
Refer to section Section 7.9.2 for details on the IBU handling of Istream errors. 

7.2.1.8 PC Load Effects 

The assertion of LOAD_NEW_PC by the BPU has the following effects: 

1. PREFETCH_ENABLE is set. 

2. VERA is loaded. 

VTBA<31:3> is loaded from the global Ibox bus NEW_PC<31:3> 

3. MHARDJERR is cleared. 

4. IMMGT.EXC is cleared. 

5. MISSJPENDING is cleared. 
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6. WRTIEJPENDING is cleared. 

7. VIC.READ is set. 

8. I%FLUSH_IREF_LAT_H is asserted by the BPU to the Mbox. 

The VIC reacts to any LOAD_NEW_PC from the BPU on a cycle by cycle basis as follows: 
Cycle N : 

• The Ibox may make an Istream request this cycle. 

• Fill data returning from the Mbox to the Ibox is ignored. 

Cycle N+l : 

• LOAD.NEWJPC is asserted to redirect instruction flow. 

• I%FLUSH_IREF_LAT_H is asserted to clear outstanding Istream references. 

• The Ibox may make an Istream request this cycle which is ignored by the Mbox. 

• Fill data returning from the Mbox to the Ibox is ignored. This is the last cycle in which fill 
data for the Istream being flushed can be sent. 

• Prefetching is enabled if previously disabled. 

• MISS_PENDING is cleared and VTC_READ is set. 

• New VIC hit or miss is determined. 

Cycle N+2 : 

• The Ibox may make a new Istream request based on whether the VIC hit or missed. 

• MISS_PENDING may be set and VTC_READ cleared if a VIC miss was determined. 

• The Mbox may not send Istream data for the old Istream request to the Ibox. 

Section 7.6 and Section 7.5.1.7 explain more about PC loads. 

7.2.1.9 E%STOP_IBOX__H Effects 

The assertion of E%STOPJDBOXja by the Ebox has the following effects: 

1. PREFETCH_ENABLE is cleared. 

2. MHARD_ERR is cleared. 

3. IMMGTJEXC is cleared. 

4. IHARD_ERE is cleared. 

5. MISSJPENDING is cleared. 

6. WRITE_PENDING is cleared. 

7. VTC_READ is cleared. 

8. I%FLUSH_IREF_LAT_H is asserted by the BPU. 

The VIC reacts to a E%STOP_IBOX_H on a cycle by cycle basis as follows: 
Cycle N : 

• E%STOP_IBOX_H is asserted. 

• The Ibox may make an Istream request this cycle. 
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* Fill data returning from the Mbox to the Ibox is ignored. 
Cycle N+l : 

* I%FLUSH_IEEF_LAT_H is asserted to clear outstanding Istream references. 

* The Ibox -will not make an Istream request this cycle. 

* Fill data returning from the Mbox to the Ibox is ignored. This is the last cycle in which £11 
data for the Istream being flushed can be sent. 

* Prefetching is disabled. 

* MISSJPENDING and VIC_READ are cleared, VIC is put into an idle state, waiting for an 
E%IBOX_LOAD_PC_L from the Ebox. 

7.2.1.10 Prefetch Stop Conditions 

PREFETCH_ENABLE is cleared in the following cases: 

1. Any VIC, Mbox error, or Mbox exception 

when a VIC error is detected or Mbox error is reported. 

2. E<7cSTOP_ibox_H signaled by Ebox 

when the Ebox microcode performs a MISC/EE5ET_CPU which asserts E < £STOP_IBOX_H. 

3. STOP_VIC_PREFETCH, STOPJPARSER bit from the IROM 

stops Ibox prefetching for those instructions expected to redirect the instruction flow or access 
the IPRs. 

7.2.1 .1 1 Prefetch Start Conditions 

PREFETCH_ENABLE is set in the following cases: 

1. PC load 

on all PC loads. 

2. E%RESTART_IBOX_H signaled by Ebox 

when the Ebox microcode performs a E%RESTARTJDBOX_H, unless there is an outstanding VIC 
or Mbox error, or a PC load by the Ebox is pending, as signaled by E%LBOX_LOAD_PC_L. 

7.2.1.12 Prioritized List of Prefetch Start/stop Conditions 

The following priority is followed when multiple prefetch start/stop conditions occur simultaneously. 

1. E%STOP_IBOX_H - stops prefetching 

2. PC Load - starts prefetching 

3. E%IBOX_LOAD_PC_L - stops prefetching (a PC load is pending) 

4. Any VIC or Mbox Error or Exception - stops prefetching 

5. E%RESTART_BBOX w H - starts prefetching 

6. STOP_VTC_PREFETCH - stops prefetching 
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7.2.1.13 VIC Enable 

The VIC powers up with VTCJENABLE clear. VTCJENABLE can be set and cleared during normal 
operation through the IPR register described in Section 7.2.1.16. VIC_ENABLE is cleared by- 
hardware when any VIC parity error is detected. 

MACROCODE RESTRICTION 

In functional operation, an REI must precede the MTPR which enables the VIC in order 
to flush all of the valid bits. However, if all the valid bits are guaranteed to have been 
written with a known value (such as in diagnostics or in macrocode that initializes the 
entire VIC), then this REI may be omitted. 

7.2.1.14 VIC Flushing 

The Ebox asserts E%FLUSH_VIC_H under microcode control to flush the VTC (clear all data valid 
bits). VIC flushes occur in such instances as the REI instruction, machine checks, and certain 
exceptions and interrupts. 

MICROCODE RESTRICTION 

The Ebox microcode guarantees that prefetching is disabled whenever E%FLUSH_VIC_H 
is asserted, either implicitly in the context of an instruction with a STOPJPARSER assist 
or by performing an explicit E%STOP_EBOSLH . 

The VIC reacts to a E%FLUSH_VIC_H on a cycle by cycle basis as follows: 

Cycle N : 

• Prefetching has already been disabled. 

• E%FLUSH_VIC_H is asserted. 

• The Ibox may make an Istream request this cycle. 

• Fill data returning from the Mbox to the Ibox is ignored. 

Cycle N+l : 

• I%FLUSH_IREFJLAT_H is asserted to clear outstanding Istream references. 

• The Ibox will not make an Istream request this cycle. 

• Fill data returning from the Mbox to the Ibox is ignored. This is the last cycle in which fill 
data for the Istream being flushed can be sent. 

7.2.1.15 Flushing IREFs 

The signal I%PLUSH_IREF_LAT_H is asserted by the BPU whenever a new PC is loaded indicating 
a redirection of the Istream. It is also asserted whenever there is a E%STOP_IBOXJH or a 
E%FLUSIl_VIC_H from the Ebox. In all cases, the Mbox may continue to return VIC fill data in the 
same cycle as the I%FLUSH_IREF_LAT_H, but not the following cycle. The VIC will ignore any fill 
data received in the same cycle or the one cycle previous to the cycle in which I%FLUSH_IREF_LAT_H 
is signaled. 
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7.2.1.16 VIC Control and Error Registers 

The VIC contains 4 internal processor registers (IPRs) which provide VIC control and read/write 
access to the arrays. 

MACROCODE RESTRICTION 

VICJENABLE must be cleared before writing to the VIC IPRs: VMAR, VDATA, or VTAG. 
VIC.ENABLE must be cleared before reading from VIC IPRs: VDATA, VTAG. In functional 
operation, an REI must proceed the MTPR which enables the VIC. 



See Section 7.4.2.8 for details of the IPR mechanism. 
Figure 7-4: IPR DO (hex), VMAR 
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Table 7-5: VMAR Field Descriptions 



Name 


Extent 


Type 


Description 


LW 


2 


WO 


Longword select bit. Selects longword of sub-block for cache access 


SUB_BLOCK 


4:3 


RW 


Sub-block select. Selects data sub-block for cache access, also latches 
viba<4:3> on vie parity errors 


ROW.INDEX 


10:5 


RW 


Row select. Row index for read and write access to cache array, also 
latches vxba<10:5> on vie parity errors 


ADDR 


31:11 


RO 


Error address field. Latches tag portion of vbba on vie parity errors 



When the VIC is disabled, the VIC Memory Address Register (VMAR) may be used as an index 
for direct IPR access to the cache arrays. VMAR<10:5> supply the cache row index, VMAR<4:3> 
supply the cache sub-block, and VMAR<2> indicates the longword within a quadword address. 

VMAR also latches and holds the VIBA<31:3> on VIC array parity errors. 
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Figure 7-5: IPR D1 (hex), VTAG 
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!»-+•— +—+—+—+—+— + — +-_+--+--+_-+_-+--+--+_-+--+__+__+__+__+__+— + — + __+__ + __ + __ + __ + __ + __ + __ + 

I TAG | 1| 1| TP | DP | V | : VTAG 



Table 7-6: VTAG Held Descriptions 



Name 


Extent 


Type 


Description 


V 


3:0 


RW 


Data valid bits. Supply data valid bits on array read/writes 


DP 


7:4 


RW 


Data parity bits. Supply data parity on array read/writes 


TP 


8 


RW 


Tag parity bit. Supplies tag parity on tag array read/writes 


TAG 


31:11 


RW 


Tag. Supplies tag on tag array read/writes 



The VTAG IPR provides read and write access to the cache tag array. An IPR write to VTAG will 
write the contents of the M%MDJBUS_H<63:0> to the tag, parity, and valid bits for the row indexed 
by VMAR<10:5>. VTAG<31:11> are written to the cache tag. VTAG<8> is written to the associated 
tag parity bit. VTAG<7:4> are used to write the four data parity bits associated with the indexed 
cache row. Similarly VTAG<3:0> write the four data valid bits associated with the cache row. 
DP<3:0> and V<3:0> are the data parity and data valid bits, respectively, for the 4 quadwords 
of data in the same row. DP<0> and V<0> correspond to the quadword of data addressed when 
address bits 4:3 = 00, DP<1> and V<1> correspond to the quadword of data addressed when 
address bits 4:3 = 01, etc. 



Figure 7-6: IPR D2 (hex), VDATA 



31 30 29 28127 26 25 24|23 22 21 20119 18 17 16115 14 13 12 | 11 10 9 8 | 7 6 5 4 | 3 2 1 0 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I DATA I : VDATA 

+ + — + + 1- — + 4 + + + + — + — + + + + — H + + + + + — + + — + + + — H + — + + — + + 



Table 7-7: VDATA Field Descriptions 



Name 


Extent 


Type 


Description 


DATA 


31:0 


RW 


Data for data array reads and writes 



The VDATA IPR provides read and write access to the cache data array. When VDATA is written, 
the cache data array entry indexed by VMAR is written with the IPR data. Since the IPR data is 
a longword, two accesses to VDATA are required to read or write a quadword cache sub-block. 

Writes to VDATA with VMAR<2> = 0 simply accumulate the IPR data destined for the low longword 
of a sub-block in FDLL_DATA<3 1:0>. A subsequent write to VDATA with VMAR<2> = 1 directs the 
the IPR data to FEUL_DATA<63:32>, and triggers a cache write sequence to the sub-block indexed 
by VMAR. 
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Reads to VDATA with VMAR<2> = 0 trigger a cache read sequence to the sub-block indexed by 
VMARo. The low longword of the a sub-block is returned as IPR read data. A read of VDAIA with 
VMAB<2> = 1 returns the high longword of the sub-block as IPR data- 



Figure 7-7: IPR D3 (hex), ICSR 



31 30 29 28127 26 25 24 |23 22 21 20|19 18 17 16115 14 13 12|11 10 9 8|7 654132 10 
I- — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I 0 I I I I 0| | :ICSR 



TP ERR + | | | 

DPERR + | | 

LOCK + I 

ENABLE + 



Table 7-8: ICSR Field Descriptions 



Name 


Extent 


Type 


Description 


ENABLE 


0 


RW,0 


Enable Bit. When set, allows cache access to the VIC. Initializes to 
0 on RESET. 


LOCK 


2 


WC 


Lock Bit. When set, validates and prevents further modification of 
the error status bits in the ICSR and the error address in the VMAR 
register. When clear, indicates no VIC parity error has been recorded 
and allows ICSR and VMAR to be updated. 


DPERR 


3 


RO 


Data Error Bit. When set, indicates data parity error occurred in 
data array if the Lock Bit is also set. 


TPERR 


4 


RO 


Tag Error Bit. When set, indicates tag parity error occurred in tag 
array if the Lock Bit is also set. 



The ICSR IPR provides control and status functions for the Ibox. VIC tag and data parity errors 
are latched in the read-only ICSR<4:3>, respectively. ICSR<2> is set when a tag or data parity 
error occurs and keeps the error status bits and the VMAR register from being modified further. 
Writing a logic one to ICSR<2> clears the LOCK bit and allows the error status to be updated. 
When ICSR<2> is clear, the values in ICSR<4:3> are meaningless. When ICSR<2> is set, a VIC 
parity error has occurred, and either ICSR<4> or ICSR<3> will be set indicating that the parity 
error was either a tag parity error or a data parity error, respectively. ICSR<4:3> cannot be 
cleared from software. ICSR<0> provides DPR control of the VIC enable. It is cleared on RESET. 



7.2.1.17 VIC Performance Monitoring Hardware 

Hardware exists in the Ibox VIC to support the NVAX Performance Monitoring Facility. See 
Chapter 18 for a global description of this facility. 

The VIC hardware generates two signals I%PMUXO_H and I%PMUXl_H which are driven to the 
central performance monitoring hardware residing in the Ebox. These two signals are used to 
supply VIC hit rate data to the performance monitoring counters. 
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l%PMUXo_H is asserted the cycle when a VIC read reference is first attempted while the prefetch 
queue is not full. I%PMUX1_H signals the hit status for this event in the same cycle. 

The data is captured only on the first read reference that could be used by the PFQ to avoid skewed 
hit ratios caused by multiple hits or misses to the same reference while the prefetch queue is full 
or the VIC is waiting for a cache fill. 

7.2.2 The Prefetch Queue 

The PFQ is a 4-longword-deep queue for Istream data. When prefetching is enabled, the VIC 
controls the supply of data to the PPQ. The PFQ can accept one quadword of data each cycle. When 
the PFQ contains insufficient available space to load another quadword of data it asserts PFQ.FULL 
which prevents the VTC from loading additional data into the PFQ. When the PFQ contains no 
unused Istream data it asserts PFQJEMPTY and sends it to the IBU. 

The PFQ loads data from the M%MD_BUS_H<63:0> or VKLDAIAJ8US as directed by the load signals 
LOAD_MD_DATA and LOAD_VIC_DAIA from the VIC. LOAD_MD_DATA. is asserted by the VIC only 
when there are no errors associated with the data. Data loaded from the VTC_DATA_BUS must 
be conditioned with the error signal EHARD_ERR. If LOAD_VTC_DATA and IHARD_ERR are both 
asserted, corrupted data is loaded into the PFQ from the VIC.DATA^BUS. To prevent this data from 
being used, the IBU reports the error immediately and stops parsing data. 

The PFQ determines the number of valid unused bytes of Istream data available for parsing and 
sends this information to the IBU on AVAHJLE*. When the IBU retires Istream data it signals the 
PFQ on IJBU^cRt; tjlkE_SPECB_H<5 :0> and IJ0BU9eRETlRE_OPCODE the number of Istream bytes 
retired. These two signals are used to update the pointers in the PFQ. 

The output of the PFQ is directed through a MUX which aligns the data for use by the IBU. The 
alignment MUX takes the first and second longwords and the first byte from the third longword 
as inputs. The alignment MUX outputs 6 contiguous bytes starting from any byte in the first 
longword, based on the PFQ pointers. 



7.2.2.1 PC load effects 

The PFQ is flushed when the BPU broadcasts a new PC load as indicated by I_BPU%LOAD_NEW_PC 
and when the Ebox asserts E%EBOXL.LOAD_PC_L. In addition, when the BPU loads the PC, bits 
<2:0> of the new PC are decoded and used to set the PFQ pointer. 

7.3 Instruction Parsing 

The instruction parser identifies the different components of incoming VAX instructions and 
forwards those components to other parts of the Ibox for further processing. The instruction 
parser contains two logic sub-sections - the Instruction Burst Unit (IBU) and the Instruction 
Issue Unit (nu). 
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Figure 7-8: Prefetch Queue Block Diagram 
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Hie IBU parses incoming instruction data into Opcodes, Operand Specifiers and Specifier 
Extensions and Branch Displacements. This information is then passed on to the operand 
specifier processing logic. The opcode is also sent to the ITU which generates an Ebox microcode 
entry point for this opcode and places it and other needed information in the instruction queue 
in the Ebox. See Table 7—15 for more information on the format of the Ebox instruction queue. 
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Instruction parsing is logically divided into 2 distinct activities: Instruction issue and specifier 
identification, and branch displacement and Ebox assist processing. The instruction issue and 
specifier identification activity starts when a new opcode is loaded by the IBU. The IBU sends 
the opcode to the DU for issuing to the Ebox, The instruction opcode is also used to determine 
the number of operand specifiers and branch displacements associated with the instruction. In 
parallel with instruction issue, the IBU identifies the operand specifiers. When all the operand 
specifiers are processed, the IBU begins the branch displacement and Ebox assist processing 
activity. The branch displacement (if present) is sent to the BPU, and Ebox assist specifiers (if 
present) are processed. See Section 7.3.2.7 for more on Ebox assists. 



7.3.1 VAX Instruction Format 

There are 3 components in VAX instructions: opcodes, operand specifiers and specifier extensions, 
and branch displacements. The 1 or 2 byte opcode specifies the function to be performed. Operand 
specifiers with potential extensions range from 1 to 9 bytes and specify an instruction operand 
or operand location. The 1 or 2 byte branch displacements are signed offsets used to compute 
the destination PC in branch instructions. A VAX instruction is composed of an opcode and 
optionally up to 6 operand specifiers and one branch displacement. For a given opcode, the 
number of operand specifiers and branch displacements is fixed. 

The instruction opcode is the first one or two bytes in the instruction followed by the operand 
specifiers, followed by the branch displacement, all at successively increasing addresses. All 
references to opcodes in this section refer to one-byte opcodes unless specified otherwise. For 
more information on VAX instruction formats, opcodes, and operand specifiers, see DEC STD 
032, VAX Architecture Standard. 

7.3.2 The Instruction Burst Unit 

The IBU bursts apart I stream data into its component parts: opcodes, operand specifiers, and 
branch displacements. The IBU is capable of identifying an opcode and one operand specifier each 
cycle. Operand specifiers are categorized according to the their Addressing Mode as being either 
simple or complex. Simple specifiers are register mode (Addressing Mode 5) and short literal 
(Addressing Modes 0..3). All other specifier types, including assists, are considered complex. 

The IBU retires up to 6 bytes of data from the PPQ each cycle. New data is available from the PPQ 
at the beginning of a cycle. The IBU sends the number of specifier bytes being retired back to the 
PPQ so that new data is available for processing by the next cycle. 

Instruction components extracted from the Istream data are sent to other parts of the Ibox for 
further processing. The opcode is sent to the nu and the BPU on OPCODE<8:0>. The specifiers, 
except for branch displacements, are sent to the CSU, the SBU and the OQU via SPEC_CTRL<21:0>. 
Branch displacements are sent to the BPU on B_BRANCH_DISP<7:0> and SPEC_DA!A<7:0>. 

The specifier control field SPEC_CTRL<2 1 :0 > contains information about the specifier being retired 
each cycle. SPEC_CTBL<21:14> and SPEC_DA1A<31:0> contain information used in processing 
complex specifiers. Table 7-9 describes the information contained on these busses. 
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Table 7-9: 


Specifier Control Fields 




Bit 
Field 


Field Name 


Description 


<0> 


SHLZT 


This bit is set if the specifier is a short literal. 


<6:1> 


RN/SHORT LITERAL 


Contains a 6-bit short literal if the shlit flag is set. <4:1> contains 
the creneral nuroose TeinKfceT numbeT aRSociated vrifcVi fcVi^ rim* cifiw 
if the shlit flag is not set, in which case <6:5> are not used. 


<9:7> 


AT 


Access Type of the instruction operand with which this operand 
specifier iB associated. 


<11:10> 


DL 


Data length of the instruction operand with which this operand 
specifier is associated. 


<12> 


VALID 


Flags data valid on the bus. 


<13> 


COMPLEX 


This bit is Bet if this is a complex specifier. 



If the IBU is retiring a specifier, SPEC_CTRL<21:0> and SPECJDAIA.<31:0> contain information 
about the specifier being retired. SPEC_CTRL<21:14> and SPEC_DATA<31:0> contain valid data 
used by the CSU only when the specifier is complex.. If a simple specifier is being retired, the 
information on SPEC_CTEL<21:14> is invalid and not used by the CSU and the complex flag 
SPEC_CTRL<13> is not set. Table 7-10 describes the fields in SPEC_CTRL<21:14> used for complex 
specifiers. Table 7—11 describes the fields in SPEC_DAIA<31:0> used by the CSU and BPU. When 
displacement and displacement deferred mode specifiers are processed, byte and word data length 
specifiers are sign extended to longword data length on SPEC_DAIA.<31:0>. 



Table 7- 


-10: Complex Specifier Control Fields 


Bit 






Field 


Field Name 


Description 


<16:14> 


DISPATCH 


Dispatch address for Complex Specifier Unit Control Store. 


<17> 


AT_RMW 


1 if access type of operand is R, M or W. 


<18> 


INDEXED 


This bit is set if mode of previous specifier is index. 


<19> 


ASSIST 


This bit is set if this is an Ebox assist specifier. 


<20> 


PC.MODE 


This flag is set if the bits <3:0> of the specifier point to GPR 15 = PC. 


<21> 


JMP_OR_JSB 


This bit is set if this instruction is a JMP or JSB. 
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Table 7-11 : Specifier Data Fields 



Bit 






Field 


Field Name 


Description 


<7:0> 


WORD_DISP 


Upper order byte of word displacement if branch displacement is being 






processed- Otherwise, the lower order byte of data for immediate and 






displacement mode specifiers. 


<31:8> 


SPJDA3X 


Upper 3 bytes of data for immediate and displacement mode specifiers. 



7.3.2.1 Specifier Identification 

In the instruction issue and specifier identification phase of instruction parsing, operand specifiers 
are parsed, and the necessary information about each specifier is sent to the specifier processing 
logic. The information needed by the Ebox to process the instruction is also identified and sent 
to the nu. Each time a new opcode is loaded in the IBU, instruction context for that opcode is 
extracted from PLAs, complimentary logic, and the Instruction ROM (IROM). This information is 
summarized in Table 7-12. 

As each specifier is identified, the current SPEC_COUNT is decremented. When this counter 
reaches 0, the IBU enters the next phase of instruction parsing, Ebox assists and branch 
displacements processing. 



Table 7-12: Instruction Context Summary 



Field Name 



# bits Description 



Instruction Context stored in the EROM 



SPKC_CODNT 
STOP_PARSER 



ASSIST_COUNT 

ASSIST 

AJVT 

AJ>L 



Number of specifiers for this instruction 

STP_StlPPBESS_PcVsTP_RESTARTJBOX: 



0/0 



0/1 



Do not stop parser,make a PC queue entry for the next 
instruction. 



Stop parser at the end of the instruction, make a PC 
queue entry for the next instruction, and restart parser on 

E*BESTABT_IBOX_H. 

1/0 Stop parser at the end of the instruction, suppress PC entry 

for next instruction until loadjnew_pc is received, and restart 
parser on LOAD PC. See Table 7-14. 

1/1 Stop parser at the end of the instruction, suppress PC queue 

entry for next instruction until igadjnkwjpc is received, restart 
parser on e%kestart_iboxjel 

Number of Ebox assists for this instruction 

Assist dispatch 

Access type for Ebox Assist 

Data Length for Ebox Assist 



DIGITAL CONFIDENTIAL 



Thelbox 7-21 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Table 7-12 (Cont.): Instruction Context Summary 

Field Name # bits Description 



Instruction Context stored in the IROM 



A_REG 


1 


Register for Ebox Assist 


ATI 


s 


Access type for specifier # 1 


AT2 


3 


Access type for specifier # 2 


DL1 


2 


Data length for specifier # 1 


DL2 


2 


Data length for specifier # 2 


FB 


1 


1 when this is an Fbox instruction 


DISPATCH 


9 


Ebox microcode dispatch address 


E_DL 


2 


Data length for instruction execution 


Instruction Context stored in the PLAs 


AT3 


3 


Access type for specifier # 3 


AT4 


3 


Access type for specifier # 4 


ATS 


1 


Access type for specifier # 5 


AT6 


1 


Access type for specifier # 6 


DL3 


2 


Data length for specifier # 3 


DL4 


2 


Data length for specifier # 4 


DL6 


2 


Data length for specifier # 5 


DLS 


1 


Data length for specifier # 6 


B 


1 


Indicates that there is a branch displacement. 


DISP.SXZE 


1 


Size of the branch displacement. 0 = byte displacement, 1 = word 


Instruction Context decoded by logic 


VFEELD_8PEC 


1 


Indicates how many source queue entries to allocate for RMODE (Mode 5) 






specifiers with variable bit field access type. 0 = 1 entry, 1 = 2 entries. 



Each cycle, the IBU evaluates the following information to determine if an operand specifier is 
available and how many PFQ bytes should be retired to get to the next opcode or specifier: 

• The number of PFQ bytes available. Each cycle, the PFQ provides the IBU with the number of 
instruction stream bytes available on AVAEL_LE<5:0>. This can be as little as 0 and as many 
as 6. 

* The number of specifiers left to be parsed in the instruction stream. IBU keeps a running 
count of the number of specifiers left to be parsed for the current instruction. 

* The data length of the next specifier. 

• The COMPLEX_UNIT_BUSY flag SlJVALED. When the CSU is busy and cannot accept another 
complex specifier, Sl_VALH> is asserted. If the IBU identifies a complex specifier while this 
signal is asserted, it stalls until the flag is cleared by the CSU. 
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• DAIA_LENGTH_VALED flag. This flag is asserted when the instruction PLAs have valid data 
length information ready. This nag is cleared when a new opcode is loaded and set when the 
access type and data length information is available for use. 

• Specifier bus enable flag, SPEC_CTEL_ENABLE, from the OQU. This flag enables the loading 
of specifier information onto the specifier control bus. If SPEC_CTEL_ENABLE is 1 then the 
specifier control bus is enabled, and one specifier can be processed. If SPEC_CTRL_ENABLE is 
0 then no specifiers can be processed, and the EBU stalls. 

• The parser stopped flag PARSER_STOPPED. There are many times when the parser must 
be stopped to prevent it from interfering with Ebox activity. When this is necessary, 
PABSER.STOPPED is asserted and all parser activity stops. 

• The next 2 bytes of the instruction stream. 

If the specifier byte is a simple specifier (Addressing Modes 0..3, or 5), and the following conditions 
are met, then the information for this specifier is driven onto SPEC_CTRL<12:0>, and the specifier 
byte is retired from the PFQ at the end of the cycle: 

1. There are at least 2 bytes of valid PFQ data. (At least one byte in the specifier field and one 
byte in the opcode field.) 

2. The parser is not stopped. 

3. There is at least one specifier remaining for this instruction. 

4. SPEC_CTRL_ENABLE = 1. 

If the first specifier byte is a complex specifier, and the following conditions are met, then the 
information for this specifier is driven onto SPEC_CTRL<21:0> and SPEC_DA3A<31:0>, and the 
appropriate number of PFQ bytes for this specifier are retired from the PFQ at the end of the cycle: 

1. The number of bytes required according to the Addressing Mode and Data Length of the 
specifier (plus one for the opcode field) are available from the PFQ. 

2. The parser is not stopped. 

3. There is at least one specifier remaining for this instruction. 

4. SPEC_CTRL_ENABLE = 1. 

5. COMPLEX_UNTT_BUSY flag is not asserted. 

7.3.2.2 Operand Access Types 

There are 6 different access types for operands. The access type information determines whether 
the operand is a source or destination operand, and whether the operand, or the address of the 
operand is needed by the Ebox. These access types are modeled after, but are not identical to, 
the operand access types specified in the architectural summary. 

• A (Address) 

An operand with access type = A is a source operand. The Ebox gets the address of the 
operand, not the actual operand. 

• R (Read) 

An operand with access type = R is a source operand. The Ebox gets the actual operand. 

• M (Modify) 

An operand with access type = M is both a source and a destination. The Ebox gets the actual 
operand and a pointer to the destination. 
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• W (Write) 

An operand with access type = W is a destination operand. The Ebox gets a pointer to the 
destination. 

• VR (Variable bit field read-access) 

An operand with access type = VR, is a source operand. The Ebox gets the actual operand 
if the addressing mode of the specifier for the operand is RMODE (Mode 5). Otherwise the 
Ebox gets the address of the operand. 

• VM (Variable bit field modify-access) 

An operand with access type = VM is both a source and a destination. The Ebox gets the 
actual operand if the addressing mode of the specifier for the operand is RMODE (Mode 5). 
Otherwise the Ebox gets the address of the operand. If the operand specifier is RMODE, the 
Ebox gets a pointer to the destination. Otherwise no destination pointer is supplied. 

7.3.2.3 DL stall 

For all but one addressing mode, the number of bytes to retire for a specifier is determined 
entirely by the addressing mode. Immediate mode (SF) addressing, however, requires the data 
length information for the operand to determine how many PPQ bytes to retire. In the event 
that a new opcode is loaded and the first specifier is an immediate mode specifier, the absence of 
DAIA_LENGTH_VALTD causes the IBU to stall because there is no way to determine the number of 
PPQ bytes to retire for this specifier. DATA_LENGTH_VALID is asserted the following cycle after the 
opcode has passed through the instruction PLAs and EROM to generate the required data length 
information. The immediate mode specifier can be retired the following cycle if the conditions 
described above are met. 

7.3.2.4 Driving SPEC_CTRL 

The data on SPEC_CTRL<13:0> is used by the OQU to generate Ebox source queue and destination 
queue entries that may be needed in the next cycle. The data on SPEC_CTB1<21:14> is used by 
the CSU to generate the microcode dispatch addresses. SPEC_DATA<31:0> contains instruction 
stream data for Immediate and Displacement mode specifiers. 

7.3.2.5 PC and Delta_PC 

The IBU keeps a local copy of the PC called the IBU_PC which points to the next byte of I stream 
data that will be processed by the IBU. 

When the IBU retires instruction stream data, the IBU_PC is incremented by the number of operand 
and operand specifier bytes retired as signaled by SPEC_BYTES_RETTR.ET) and LOAD_NEW_OPCODE. 
The IBU_PC can be loaded from the NEWJPC<31:0> when the signal LOAD_NEW_PC is asserted 
and all operand specifier, Ebox assist, and branch displacement processing is completed by the 
IBU. The IBU_PC is sent to the CSU, nu and BPU on EBUJPC<31:0>. 
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7.3.2.6 Branch Displacement Processing 

Some instructions have branch displacements as indicated by B. If B is set, the instruction has 
a branch displacement and the branch size is determined by DISPJ5IZE. Both B and DISPJ51ZE 
are outputs of the instruction PLAs. A DISP_SEZE of 0 indicates a byte branch displacement and a 
DISP_SIZE of 1 indicates a word displacement. 

The branch displacement is always the last piece of data for an instruction and is used by the BPU 
to compute the branch destination. Branch displacements are not sent to the specifier parsing 
logic. They are sent only to the BPU on SPEC_DA*IA<7jO> and B_BRANCH_DISP<7:0>. Branch 
displacement processing begins after all the non-displacement specifiers are parsed and retired 
from the PFQ. A branch displacement is processed when the following conditions are met: 

1. There are no specifiers left to be processed (Ebox assists excluded). 

2. The branch flag B<0> is set in the instruction PLAs and the branch displacement has not been 
processed. 

3. The required number of bytes is available from the PFQ according to DISP_SIZE. 

4. The parser is not stopped. 

5. BRANCH_STALL is not asserted. BRANCH_STALL occurs on the load opcode of the next 
instruction after a second conditional branch is received. 

BRANCH.STALL is described in the Section 7.5.1.6 section. 

If all these conditions are met, then the branch displacement is placed on SPECJDAIA<7:0> and 
BJBRANCHJ>ISP<7:0> and DISPJVAUD is asserted. SPEC_DAIA<7:0> contains the high byte of 
a word branch displacement and B_BRANCHJDISP<7:0> contains the low byte of a word branch 
displacement or the byte branch displacement. If these conditions are not met, the IBU stalls. 

If an instruction contains no operand specifier, the branch displacement can be processed during 
the same cycle that the opcode is processed provided that there is sufficient data in the PFQ. 



7.3.2.7 Ebox Assist Processing 

Ebox assist processing can go on in parallel with branch displacement processing since they 
require no common resources. Ebox assists are implicit specifiers which help the Ebox speed 
up some of the time critical instructions. lb the CSU, these assists look very similar to normal 
complex specifiers and have associated with them all the normal access type, data length and 
register information. The only real difference is where this data comes from. Since these specifiers 
are not a part of the instruction stream, information about them must be stored in the IROM. The 
7 Ebox assists are summarized in the following table: 



Table 7-13: Ebox Assist Summary 



Assist Access Data 

Name Type Length 



Register Description 



RET_DEST 



Read 



Quad 



FP 



Read register mask for Ebox. Read return PC for 
Ebox and bpu 
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Table 7-13 (Cont.): Ebox Assist Summary 


A *_l 

Assist 


Access 


Data 


Register 


Description 


Name 


Type 


Length 






RSB_DEST 


Read 


Long 


SP 


Read return PC for Ebox and bpu 


(SPH-KQ 


Read 


Quad 


SP 


Quadword stack pop 


-(SP)/WL 


Write 


Long 


SP 


Longword stack push 


TCJKL 


Read 


Long 


NONE 


Current PC is sent to Ebox 


PC.-(SP).ML 


Modify 


Long 


SP 


Combines effects of PC.RL and -(SP).WL assists 


STOP.MBOX.QUEUE 


NONE 


NONE 


NONE 


Mbox specifier queue is stopped 



All of the Ebox assists generate dispatches to the CSU. 

When all the normal specifiers for an instruction have been identified and retired from the PPQ, 
the Ebox assist (if any) is processed. The maximum number of assists for any instruction is 1. 

An Ebox assist is processed and its associated data driven onto SPEC_CTRL<21:0> when the 
following conditions are met: 

1. There is an Ebox assist. 

2. The parser is not stopped. 

3. It is not the same cycle as the opcode load. 

4. If the instruction is BSBW or BSBB, the branch displacement has been parsed. 

5. SPEC_CTRL_ENABLE = 1. 

6. COMPLEX_UNXTJBUSY flag is not asserted. 

BSBW and BSBB instructions have PC.RL Ebox assists. For these instructions, the branch 
displacement must be retired and the IBU_PC must be updated to point to the byte following the 
branch displacement before the PC.RL assist can be processed. 



7.3.2.8 Reserved Addressing Modes 

Some combinations of specifier mode, specifier register, and access type cause reserved addressing 
mode faults in the VAX architecture. Refer to Table 7—33 for more details on reserved address 
mode detection. 

7.3.2.9 Quadword Immediate Specifiers 

Immediate mode specifiers with quadword data length take two or more cycles to process. When 
a quadword immediate specifier is detected by the TJBU parse logic, the first longword is processed 
(like a longword immediate specifier) and QUAD_FLAG, is set. 

QUAD_FLAG is used by the IBU retire logic to properly retire the next four bytes when they 
become available in the PFQ. When the second longword is retired, QUAD_FLAG is cleared and the 
specifier count is decremented. QUAD_FLAG is also cleared by E%BRANCH_MISPREDICT_H, 
E%STOP_IBOXJB, I%IMEM JMEXC_H, and I%EMEM_HERR_H. 
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The first longword of the quadword immediate data is sent to the CSU in the normal fashion. The 
second longword of the quadword immediate data from the instruction stream is discarded. The 
CSU then uses the specifier PC and generates a memory request to fetch the next four bytes of 
the immediate data. 

7.3.2.10 Index Mode Specifiers 

Index mode specifiers are two-part specifiers which take two or more cycles to process. The 
first byte of an index mode specifier specifies the index register; it is treated like any other 
complex specifier with the exception that a flag, indexjwait is set, and the specifier counter is 
NOT decremented. Additionally, SPEC_CTRL<21:17> is ignored by the CSU. 

When the second byte of an index mode specifier is processed, the specifier counter is decremented 
and SPEC_CTEL<21:17> contains the appropriate data. SPEC_CTRL<18> is set and index_wait is 
cleared. 

The reserved addressing mode fault PLA in the IBU checks the mode of the second specifier byte. 
If the indexjwait is set, and if the second byte is short literal, register mode, or index mode, a 
reserved addressing mode fault is detected and sent to the Ebox on I%RSVD_ADDR_FAULT_H. Refer 
to Table 7—33 for more details on reserved addressing mode detection. 



7.3.2.11 Loading a new opcode 

A new opcode is loaded in the IBU under the following conditions: 

1. All operand specifiers, branch displacements and Ebox assists for the current instruction have 
been parsed (which asserted INSTR_DONE). 

2. The parser is not stopped. 

3. There is at least one byte of data available from the PPQ. 

4. ISSUE.STALL is not being asserted by the HU. 

5. BRANCH_STALL is not being asserted by the BPU. 

New opcodes are loaded and passed directly to the instruction PLAs and IROM. In parallel, the 
instruction issue and specifier identification process for the new instruction begins. 

When a the new opcode is loaded, a check is made to see if the value of the opcode is FD. If it is, 
no instruction parsing is done this cycle. FD_OPCODE is set, the byte is retired from the PPQ, and 
another opcode load is enabled for the following cycle. The opcode sent to the IIU and the BPU on 
OPCODE<8:0> is a concatenation of FD_OPCODE and the opcode byte. FD.OPCODE is bit 8, and 
the opcode is in <7:0>. 
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7.3.2.12 Reserved Opcodes 

Each time a new opcode is loaded in the IBU, instruction and operand specifier information is 
extracted from a set of PLAs and from the IROM in the IBU for that opcode. This information is 
specified in Table 7—12. "When a reserved or unimplemented opcode is detected, the following 
occurs: 

1. The IBU IROM has one of the STOP_PABSER bits set. This signals the IBU to stop parsing 
instruction stream data. 

2. The IBU IROM provides the reserved opcode dispatch address for Ebox microcode. 



7.3.2.13 instruction Parse Completion 

Once all the operand specifiers, branch displacements and Ebox assists have been processed, 
instruction parsing is complete and INSTRJDONE is asserted. INSTR_DONE is used by the CSU to 
make RLOG base queue entries and by the IBU to control loading of the BPU..PC under certain 
conditions. 

Additionally, if instruction parsing is complete and if there is no PC load pending, RETTRE_OPCODE 
is asserted and sent to the PFQ control logic and the nu PC queue logic. In the PPQ this signal 
increments the number of specifier bytes retired by 1 in order to retire the previous opcode and 
allow for loading of the new opcode. It is used in the nu to update the PC queue pointer under 
certain conditions. 



7.3.2.14 Operands with Access Type VR and VM 

One of the outputs from the instruction PLAs is a bit that indicates how many source queue 
entries should be written for VR and VM access type operands with register mode specifiers. 
When this bit is 0, only one source queue entry is written; when it is 1, two are written. This 
bit is available in the middle of the opcode load cycle and is sent to the OQU on VS. This signal 
remains valid throughout the instruction parsing operation. 



7.3.2.15 l%IMEM_MEXC_H and l%IMEM_HERR_H 

The IBU forwards Istream errors to the Ebox on I%IMEM_HERR_H and I%IMEM_MEXC_H . These 
signals flag memory management exceptions and hardware errors. The IBU receives three 
error signals from the VIC which are used to determine when to assert I%IMEM_HERR_H and 
I%IMEM_MEXC_H: IHARD_ERR, MHAKDJERR, and IMMGTJEXC. Refer to Section 7.2.1.7 for more 
detail on these signals. 

The IBU asserts I9SJMEM_MEXC_H if IMMGTJEXC is asserted from the VIC and the PPQ is empty or 
contains insufficient data to complete parsing of the current specifier, and parsing is not stopped. 
I%IMEM_MEXC_H remains asserted as long as these conditions are met. 
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The IBU asserts I%IMEM_HERR_H under two different conditions. First, if MHAED_ERR is asserted 
from the VIC and the PFQ is empty or contains insufficient data to complete parsing of the 
current specifier, and parsing is not stopped. Additionally, if EHARDJERR is asserted from the 
VIC, I%IMEM_HERR_H is asserted immediately without waiting for the PFQ to run dry or contain 
insufficient data. I%IMEM_HERR_H remains asserted as long as these conditions are met. 



7.3.2.16 IBU stop and restart conditions 

Two categories of conditions cause the IBU to stop parsing: the first is exceptions, the second is 
instructions which need pipeline synchronization. When the IBU is stopped, PARSERJ5TOPPED is 
asserted. 

Table 7—14 summarizes all IBU stop and restart conditions. 



Table 7-14: IBU stop and start summary 



Stop Condition 


Start Condition 


Description 


£<*STOPJBOXJB 


E^BESTAST.IBOXJE 


stop ibox, Ebox restarts parser 


I^R5\D_ADDR_FAULT_H 


E^RES TART_EBOX_H 


reserved addressing mode fault, Ebox restarts parser 


IHAKD_ERR 


E^RESTART_IBOX_H 


vie hardware error, Ebox restarts parser 


FPD and load 
opcode 


E%RESTABT_IBOX_H 


FPD is set, parse opcode and stop parser, Ebox restarts 
parser 


E*BRANCH_MISPREDICT_L 


ijcemaBaxjBBsrART 


branch mispredict, ibox restarts parser 


stop parser set - 
case 1 


I_CSU%ZB03UtESTART 


parser stopped when stp_restaxt_ibox and instr.done are 
both asserted, ibox restarts parser 


stop parser set - 
case 2 


I_IBU%CSU_JJ)_RKSXART 


parser stopped when stp_suppress_pcq and and instr_done 
are both asserted and stp_rkstart_ibox is de-asserted, 
restart occurs when the csu supplies the bfu with the new 
PC and all other instruction parsing is complete 



7.3.2.17 First Part Done (FPD) Set 

Some long instructions can be interrupted in the middle of their execution sequence (e.g. MOVC 
instructions). When such an instruction is interrupted, the first part done bit (FPD) in the 
Processor Status Longword (PSL) is set indicating that the interrupted instruction will be 
resumed at the execution point where the interrupt occurred, rather than at the beginning of 
the instruction. All such instructions have one of the STOPJPABSER bits set in the ROM. This 
allows the FPD pack-up to IPR read the current PC (from the top of the PC queue) and then load 
the PC of the interrupt handler. 

When an instruction such as MOVC is interrupted, and the interrupt is processed, processor 
context is switched back to the interrupted process by the REI instruction. This instruction 
causes the PSL of the interrupted process to be reloaded with the FPD bit set. The Ebox sends the 
E%PPD_SET_L signal to the Ibox. If E%FPD_SET_L is asserted the Ibox will re-issue the interrupted 
instruction when valid opcode data is parsed by the IBU. However, after parsing and issuing the 
instruction, no farther data is parsed by the IBU. 
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When the interrupted instruction is complete, the Ebox loads the PC of the next instruction and 
parsing is restarted by the IBU. 



7.3.3 The Instruction Issue Unit 

The HU takes opcodes received from the IBU and generates the information needed by the Ebox to 
begin instruction execution. An instruction is said to be issued when this information is sent to 
the Ebox instruction queue. Table 7—15 shows the format of the instruction queue entries created 
by the ITU. This information is sent to the Ebox on I%IQ_BUS_H<21:0>. 

The IIU must also keep track of the program counter (PC) values of the opcodes that are either in 
the instruction queue or are in Ebox execution. If the Ebox detects a fault during the execution 
of an instruction, it needs to be able to get at the PC of the faulting opcode. These PCs are kept 
in the PC queue. 



Table 7-15: 


Instruction Queue Entry Format 




Field 




Bit Field 


Name 


Description 


<0> 


VALID 


1 when this queue entry is valid 


<9:1> 


DISPATCH 


Ebox microcode dispatch address 


<10> 


FB 


1 when this is an Fbox instruction 


<12:11> 


DL 


Data length for instruction execution 


<21:13> 


OPCODE 


Instruction Opcode 



Most of the information needed to create an instruction queue entry is stored in the instruction 
ROM located in the IBU. See Table 7—12. The opcode used to access the ROM is a 9-bit composite 
opcode consisting of 8 true opcode bits and 1 bit indicating whether or not this is a two byte FD 
opcode. This extra bit is generated by the IBU and passed along with the other 8 opcode bits. 

The IIU issues an instruction as soon as the instruction ROM access completes unless the 
instruction queue is full. The instruction queue full status is computed and maintained locally 
in the UU. 



7.3.3.1 Issue Stall 

The nu maintains a counter of the number of slots filled in the Ebox instruction queue. Each 
time a new opcode is issued to the ITU, the counter is incremented. When the Ebox removes an 
entry from the queue as indicated by the E%RETIRE_rNSTR_L signal, the counter is decremented. 
When the counter equals 6, the depth of the instruction queue, ISSUE_STALL is asserted, blocking 
the IBU from parsing a new opcode. 
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7.3.3.2 PC Queue and PC loads 

The PC queue is a 7 entry FIFO which contains PC values of opcodes that are either in the 
instruction queue or are in Eboz execution. Opcode PCs are added to the back of the queue 
as instructions are issued and removed from the front of the queue when the Ebox retires an 
instruction as indicated by E%RJETIRE_INSTR_L. The PC of the next instruction to be retired by 
the Ebox is always at the front of the queue unless the PC queue is empty. The PC queue is 
flushed on chip reset or when either E%FLUSH_PCQ_H or E%BRANCH_MISPEEDICT_L is asserted 
by the Ebox. 

Any time the Ibox broadcasts a new PC on NEWJPC<31:0>, as signaled by LOAD_NEW_PC , it is 
loaded into the next available slot in the PC queue. If E%BRANCH_MISPREDICT_L caused the PC 
load or if the Ebox stops the Ibox as signaled by E%STOP_EBOX_H, then following additional actions 
are taken: 

• The instruction queue counter is cleared. 

• ISSUE.STALL is cleared if set. 

In the event of an Ebox PC load, the parser is guaranteed to stop either by E%STOP_IBOX_H, 
STP_SUPPRESS_PCQ, or STP_RESTART_IBOX several cycles before the actual PC load occurs. These 
signals are used in the IBU to stop instruction parsing. When the new PC arrives, the PC queue 
is empty and ready to accept the new PC into the first available slot. 

The value of STP_SUPPRESS_PCQ affects whether the PC queue loads the next PC as the parser 
stops. If STP..SUPPRESS_PCQ is asserted then the next PC is entered in the PC queue. 

The value of the IBU_PC is loaded into the PC queue if LOAD_jNEW_PC is not asserted, the burst 
unit signals that the parsing is complete with BETTRE_OPCODE, E%FPD_SET_L is not asserted, and 
either of the following conditions are true: 

• STP_SUPPRESS_PCQ is not asserted or STOP_VTC_PREFETCH is not asserted, and the BPU is 
not stalled 

• BSHJFRCLPCQ (from the BPU) is asserted and the instruction is done. 

The PC at the front of the PC queue is readable by the CSU. When the Ebox needs access to this 
PC, it stops the Ibox and sends an EPR read request to the CSU. The CSU responds by reading the 
front of the PC queue and then writing that value to the Ebox working register (WX) specified 
by a register index supplied with the IPR command. See Section 7.4.2.8 for more details on IPR 
transactions. 

MICROCODE RESTRICTION 

For proper operation, retire_instr and IPR read of the BPC (Backup PC) from the PC 
queue must not occur in the same microword. This guarantees that the PC queue does 
not decrement in the same cycle that an IPR read of the BPC occurs. 
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7.4 Operand Specifier Processing 

Operand Specifier Parsing prepares instruction operands for access by the Ebox. The three 
Ibox sub-sections which together perform this function are the Operand Queue Unit (OQU), the 
Complex Specifier Unit (CSU), and the Scoreboard Unit (SBU). The OQU handles simple specifiers 
and acts as the interface to the Ebox source and destination queues; the CSU is responsible for 
processing complex specifiers, and the SBU provides the CSU with information about the number 
of outstanding GPR read and write references in the source and destination queues. 

7.4.1 Operand Queue Unit 

The OQU controls the passing of operand information into the Ebox operand queues and the 
allocation of Ebox Memory Data registers (MDs). 

Simple specifiers are processed entirely in the OQU. Register mode specifiers are passed into the 
source or destination queues as painters to the corresponding Ebox register file location. The OQU 
passes short literal specifiers as immediate data. 

The 6 MD registers in the Ebox register file are used as destinations for operand data requests 
made by the CSU. When a complex specifier appears on the specifier control bus, the OQU allocates 
both the source queue entries and Ebox MDs and passes the Ebox register file index of the first 
allocated MD to the CSU. 

The I%OPERAND_BUS_H<14:0> transfers source and destination queue entry information to the 
Ebox. There may be up to 2 source queue entries and 2 destination queue entries made via the 
I%OPERANDJBUS_H<14:0> in a given cycle. The format for this bus is shown in Figure 7-9. 

Short literals: 
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Figure 7-9: Source/Destination Queue Entry Formats 



SHORT LITERAL Mode: 
14 13 12 111 10 



8 I 7 



< — I%OPERAND BUS H 



I MB 2 if SQ_VALID2-1 

+ short literal value2 (quad) 

—-—short literal valuel 

SHLIT (1-short lit) 

SQ_VALID2 (1-guad operand) 

SQ~VALID1 



Register Mode: 
14 13 12 111 10 



8 

I 1 



€ 5 
GPRn 



4|3 2 1 
I GPRn+1 



< — I%0PERANI> BUS H 



All Other Modes for access types read and modify: 
14 13 12 ill 10 5 8 6 5 4 ! 3* 



I 

+ REG2 (GPRn+1 tag for quad) 

RSG1 (G?Sn tag) 

GSR (i-g?r) 

— —VFIELD (I-Fieid Queue Entry) 

— SK1~T (&«not short lit) 

— — D0_VALID2 (l»quad w/m operand) 
DC~VAL-S- 

— - — SQ_VAL-32 (i-quad r/m operand) 
SC~VA1ID1 



I C 



XDr. 



KDn+1 



< — I%0?ESAXT BUS K 



+ REG2 (MDn-i-1 tag for quad) 

— REG1 (MDn tag) 

GPR (O-MD) 

VFIELD (1-Field Queue Entry) 

SHLIT (0-not short lit) 

DQ_VALID2 (1-quad w/m operand) 

DQ_VALID1 

- — — SQ_VALID2 (1-guad r/m operand) 
SQ~VALID1 



All Other Modes for access type write: 
14 13 12 111 10 9 8 | 7 € 5 



0 | 0 | 1 | 



0 I 



GPRn 



3 2 1 
GPRn+1 



< — I%OPERAND BUS H 



— REG2 (GPRn+1 for quad) 
— REG1 (GPRn tag) 
—GPR (0-mdest) 

— VFIELD (1-Field Queue Entry) 

— SHLIT (0-not short lit) 

— DQ VALID2 ( 1-quad w/m operand) 

--DQJVALID1 

— SQ_VALID2 (1-guad r/m operand) 
— SQ VALID1 
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1 wUr CnMIMl/_DUO 


_n L/eiinnion 


Bit Field 


Field Name 


Description 


VALUE2 


3:0 


Upper bits for quadword snort literal, must be zero s 


VALUE1 


9:4 


Short literal value. Lower bits for quadword 


SHUT 


10 


Short literal. 1 if short literal, 0 otherwise 


DQ_VALED2 


11 


Valid second destination queue entry - always 0 for short 
literal 


DQ.VALID1 


12 


VcXXXU vltS O vXXXcl 1/ 1U XI \|UVUC t^XXhlj tLl WHjr o \J XUX DXJ.UXV Ilu€*XxU. 


SQ.VALID2 


13 


Valid second source queue entry - set if quadword short literal 


SQJVALIDl 


14 


Valid source queue entry 


All other modes: 




Table 7-17: 


l%OPERAND_BUS_ 


_n ueTiniiion 


Bit Field 


Field Name 


Description 


REG2 


3:0 


Register or MD for 2nd source/dest queue entry of a 
quadword specifier 


REG1 


7:4 


Register or MD for 1st source/dest queue entry 


GPR 


8 


Source/dest queue entry is for a register mode specifier 


WIELD 


9 


Field queue entry to be made 


SHLIT 


10 


Short literal. 1 if short literal, 0 otherwise 


DQ_VALID2 


11 


Valid second destination queue entry for quadword specifiers 


DQ_VALID1 


12 


Valid destination queue entry 


SQ_VALID2 


13 


Valid second source queue entry for quadword specifiers 


SQ^VALIDl 


14 


Valid source queue entry 



7.4.1.1 Source Queue Interface 

The OQU can write up to two source queue entries each cycle depending on the access type and 
data length of the operand they specify. l%OPERAND_BUS_H<SQ_VALIDl> 
and I%OPERAND_BUS_H<SQ_VALID2> are the source queue entry valid hits. 
I%OPERAND_BUS_H<SQ_VALIDl> indicates that the information on I%OPERAND_BUS_H<10:4> is 
for a valid source queue entry. I%OPERAND_BUS_H<SQ_VALID2> indicates the information on 
I%OPERAND_BUS_H<3:0> is for a valid source queue entry. I%OPERAND_BUS_H<10:4> contains 
the information for any specifier that is placed on SPEC_CTRL. I%OPERAND_BUS_H<3:0> contains 
the second source queue entry whenever the specifier on SPEC_CTRL has an access type of 
Read or Modify and a data length of quadword or it is an RMODE specifier with access 
type VR or VM and the VS bit is set. I%OPEBAND_BUS_H<SQ_VALID2> is set only if 
I%OPEBAND„BUS_H<SQ_VALIDl> is set. 
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The addressing mode of the operand specifiers determines the value of the source queue 
entries. For short literal (Modes 0-3) addressing modes, I%OPEBAND_BUS_H<'VALUE 1> contains 
the short literal data directly, with I%OPERAND_BUS_H<SHLIT> set. Source queue entries 
for register (Mode 5) addressing mode specifiers contain pointers to the referenced GPR, 
with I%OPERAND_BUS_H<GPR> set and I%OPERAND_BUS_H<SHLIT> cleared. Source queue 
entries for all other addressing modes contain pointers to Memory Data (MD) registers 
in the Ebox, with I%OPERAND_BUSJB<GPR> and I%OPEBAND_BUS_H<SHLIT> both cleared. 
I%OPERAND_BUS_H<VFIELD>, is set for variable bit field specifiers and cleared otherwise. This 
bit is used by the Ebox to make Field Queue entries. 

The access type and data length of the operand being specified determines the number of source 
queue entries that are written for all operands except those with access types VR or VM. Read 
(R) and Modify (M) access type operands write one source queue entry if the operand data length 
is byte, word, or longword, and two source queue entries if the operand data length is quadword. 
Write (W) access type operands never write any source queue entries. Address (A) access type 
operands always write one source queue entry regardless of the operand data length. The number 
of source queue entries written for non-field access type operands is summarized in Table 7—18. 



Table 7-18: Source Queue Entries Written for Non-field Access Type Operands 



Access Type 


Data Length 


Number of Source Queue Entries written. 


Read (R) 


Byte, Word, Long 


1 source queue entry written 


Modify (M) 


Byte, Word, Long 


1 source queue entry written 


Write (W) 


Byte, Word, Long, Quad 


0 source queue entries written 


Address (A) 


Byte, Word, Long, Quad 


1 source queue entry written 


Read (R) 


Quad 


2 source queue entries written 


Modify (M) 


Quad 


2 source queue entries written 



For VR and VM operands, the VS bit associated with the instruction and the addressing mode 
determine the number of source queue entries that are written. For these variable bit field access 
type operands, VS performs a function similar to the data length in non-field operands. The VS 
bit specifies how many source queue entries to write for VM and VR operands with RMODE 
specifiers. The value of VS is ignored if the access type of the operand is not VR or VM. If VS 
is 0 then one source queue entry is written for VR and VM operands with an RMODE specifier. 
If VS is 1 then two source queue entries are written for VR and VM operands with an RMODE 
specifier. Only one source queue entry is written for VR and VM operands with non-RMODE 
specifiers, regardless of the value of VS. Table 7—19 shows the number of source queue entries 
written for operands with VR or VM access types. 



Table 7-19: Source queue Entries Written for VR or VM Access Type Operands 



VS 


Access Type 


Number of Source Queue Entries Written 


0 


RMODE 


1 source queue entry written 


1 


RMODE 


2 source queue entries written 


X 


non-RMODE 


1 source queue entry written 
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VS is supplied by the IBU in the middle of the cycle in which the opcode is loaded and is held 
throughout the parsing of the instruction. 

7.4.1 .1 .1 Short Literal Specifiers (Modes 0..3) 

Short literal specifiers create a source queue entry with the SHUT flag set and the short literal 
data in I%OPEBAND_BUS_H<VALUE 1>. The short literal data is the full RN_SHOET_LITERAL<6 : 1> 
from the specifier control bus. For quadword operands the OQU writes two source queue entries. 
In this case, I%OPERAND_BUS_H<VALUE2> is 0, I%OPERAND_BUS_H<VALUE 1> contains the 
short literal value, I%OPERAND_BUS_H<SHLIT> is set, and I%OPERAND_BUS_H<SQ_VALID 1 > and 
I%OPERAND_BUS_H<SQ_VALID2> are both set to indicate 2 source queue entries. 

Short literal addressing modes for VM and VR access type operands cause a reserved addressing 
mode fault to be signaled to the Ebox. All reserved addressing mode faults block the OQU from 
writing any source or destination queue entries. See Section 7.9.5 for details on these faults. 

7.4.1 .1 .2 RMODE Specifiers (Mode 5) 

Register mode specifiers create source queue entries with I<^OPEEAKD_BUS_H<REGl> pointing to 
the specified Ebox GPR index and the SHUT bit clear. The contents of I%OPEBAND_BUS_H<REGl> 
are taken directly from the specifier control bus EN field. K«OPERANI)JBUS_H<GPR> is equal to 
1 for register mode operands. If two entries are allocated for an operand due to quadword data 
length or the VS bit, the value for the second entry on I%OPERAXD_BUS_H<REG2> is the value of 
the first entry on I%OPERAXD_BUS_H<REGl> incremented by I and modulo 16. For specifiers of 
type VR or VM, the I%OPERANDJBUS_H<VFIELD> is set to indicate a variable bit field specifier 
and cleared otherwise. 

7.4.1.1.3 index Mode Specifiers (Mode 4) 

Indexed specifiers are processed by the IBU as two specifiers. Only the second specifier, the base, 
may create a source queue entry. The first specifier is recognized and ignored by the OQU if it 
is a complex specifier with the dispatch field of the specifier control bus pointing to index mode. 
Therefore, if SPEC_CTRL<COMPLEX> is set and SPEC_CTRL<DISPATCH> is index mode, then no 
source queue entries will be made for the specifier. 

7.4.1.1.4 Ali Other Addressing Modes 

Specifiers which are not literal or register mode create source queue entries with the 
I%OPERAND_BUS_H<REGl> fields pointing to Ebox MDs and the SHUT and GPR bits clear. One 
MD is allocated for each source queue entry of this type written. See Section 7.4.1.4 for more 
detail on MD allocation. If two entries are allocated for an operand due to quadword data length or 
RMODE with the VS bit set, the I%OPERAND_BUS_H<REG2> field for the second entry is equal the 
I%OPERAND_BUS_H<REGl> field of the first incremented by 1 and modulo 6. The most significant 
bit for both I%OPERAND_BUS_H<REGl> and I%OPERAND_BUS_H<REG2> are set to 1 to correspond 
with Ebox register file addressing. For specifiers of type VR or VM, the VFTELD bit is set to indicate 
a variable bit field specifier and cleared otherwise. Only one specifier per instruction may be of 
access type VR or VM, so as not to overflow the field queue. 
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7.4.1.2 Destination Queue Interface 

The OQU can write up to two destination queue entries each cycle depending on the access types 
and data lengths of the operands they specify. The addressing mode of the operand specifier 
determines the contents of the destination queue entries written. Destination queue entries for 
register (Mode 5) addressing mode specifiers contain pointers to the referenced GPR and the GPR 
flag is set to indicate a register mode destination. All destination queue entries for specifiers with 
an access type write will contain pointers to the referenced GPR, regardless of addressing mode. 
For non-register mode specifiers of access types read and modify, the I%OPERAND_BUS_H<REGl> 
and I%OPERAND_BUS_H<REG2> fields are used by the source queue and ignored by the destination 
queue. All addressing modes other than register mode (Mode 5) and short literal (Modes 0..3) 
clear the GPR flag to indicate a memory destination. I%OPERAND_BUS_H<DQ_VALIDl> is set if 
there is a valid destination queue entry. I%OPERAND_BUS_H<DQ_VALID2> indicates a second 
destination queue entry is also valid. I%OPERAND_BUS_H<DQ_YALID2> will only be set if 
l%OPERAND_BUS_H<DQ_VALIDl> i s also set. 

Short literal addressing mode specifiers for operands with access types of Write (W), Modify (M), 
and VM cause Reserved Addressing Mode Faults. Reserved Addressing Mode Faults block the 
OQU from writing any source or destination queue entries. See Section 7.9.5 for details on these 
faults. 

The access type and data length of the operand being specified determines the number of 
destination queue entries that are written for all operands except those with with access types 
VR or VM. Write (W) and Modify (M) access type operands write 1 destination queue entry if the 
operand data length is byte, word, or longword, and two destination queue entries if the operand 
data length is quadword. The number of destination queue entries written for non-field access 
type operands is summarized in Table 7—20. 

Table 7-20: Destination Queue Entries Written for Non-field Access Type Operands 



Access Type Data Length Number of Destination Queue Entries Written 



Read (R) 


Byte, Word, Long 


0 destination queue entries written 


Modify (M) 


Byte, Word, Long 


1 destination queue entry written 


Write (W) 


Byte, Word, Long 


1 destination queue entry written 


Address (A) 


Byte, Word, Long 


0 destination queue entries written 


Read (R) 


Quadword 


0 destination queue entries written 


Modify (M) 


Quadword 


2 destination queue entries written 


Write (W) 


Quadword 


2 destination queue entries written 


Address (A) 


Quadword 


0 destination queue entries written 



For VR access type operands no destination queue entries are written. For VM access type 
operands, the VS bit associated with the instruction and the addressing mode of the operand 
specifier determine the number of destination queue entries that are written. The VS bit specifies 
how many destination queue entries to write for VM access type operands with RMODE specifiers. 
The value of VS is ignored if the access type of the operand is not VM. If VS is 0 then one destination 
queue entry is written for VM access type operands with an RMODE specifier. If VS is 1 then two 
destination queue entries are written for VM access type operands with an RMODE specifier. VM 
access type operands with non-RMODE specifiers create no destination queue entries. Table 7—21 
shows the number of destination queue entries written for operands with VM access type. 
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Table 7-21: Destination Queue Entiles Written for VM Access Type Operands 



vs 


Access Type 


Number of Destination Queue Entries Written 


0 


RMODE 


1 destination queue entry written 


1 


RMODE 


2 destination queue entries written 


X 


non-RMODE 


0 destination queue entries written 



7.4.1 .2.1 RMODE Specifiers (Mode 5) 

Register mode specifiers create destination queue entries with I%OPERAND_BUS_H<REGl> 
pointing to the specified Ebox GPR and the I%OPERAND.BUS_H<6PR> bit set. The contents 
of the I%OPERAND_BUS_H<REGl> field are taken directly from the specifier control bus RN 
field. If two entries are allocated for an operand due to quadword data length or the VS bit, 
I%OPERAND_BUS_H<REG2> for the second entry is I%OPERAND_BUS_H<REGl> incremented by 
1 and modulo 16. I%OPERAND_BUS_H<DQ_VALiDl> and I%OPERAXD_BUS_H<DQ_VALID2>, the 
destination queue entry valid bits, are both set. 

7.4.1.2.2 Index Mode Specifiers (Mode 4) 

Indexed specifiers are processed by the IBU as two specifiers. Only the second specifier, the base, 
may create a destination queue entry. The first specifier is recognized and ignored when the 
specifier control bus has a complex specifier with the dispatch field pointing to index mode. In 
other words, if SPEC_CTRL<COMPLEX> is set and SPEC_CTRL<DISPATCH> equals index mode, then 
no destination queue entries will be made for the specifier. 

7.4.1 .2.3 All Other Addressing Modes 

All other addressing modes create destination queue entries with the GPR bit clear. If two entries 
are allocated for an operand due to quadword data length or VM access type with the VS bit set, 
the GPR bit applies to both entries. 

7.4.1.3 Queue Entry Allocation 

The OQU maintains a count of available Source and Destination Queue entries using an up-down 
counter for each. When the OQU allocates source queue entries, the source queue counter 
increments by the number of entries allocated. When the OQU allocates destination queue entries, 
the destination queue counter .increments by the number of entries allocated. When the source 
queue counter equals 12, the source queue is full. When the destination queue equals 6, the 
destination queue is full. 

The source and destination queue counters decrement whenever the Ebox retires entries from 
the respective queues. The signals E%SQ_RktjlkE_H< 1 :0 > and E%DQ_RETIRE_H<o> are generated 
by the Ebox, and indicate the number of source and destination queue entries, respectively, to 
be retired this cycle. Up to two source queue entries and one destination queue entry may be 
retired each cycle. The E%SQ_RETIRE_H<1:0> signal decode is demonstrated in Table 7—22 
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7.4.1.4 MD Allocation 

MDs are allocated in the OQU using an up-down allocate counter and an index counter. When 
the OQU allocates a new MD, the allocate counter increments and the current value of the index 
pointer is sent to the CSU and then incremented modulo 6. Whenever a source queue entry which 
points to an MD is retired by the Ebox, the allocate counter decrements. The value of the allocate 
counter always represents the number of previously allocated MDs and the index counter always 
points to the next MD to allocate. When the allocate counter equals 6 there are no MDs left 
to allocate. The signals E%SQ_RETTRE_MD_H<1 :0> are generated by the Ebox and indicate the 
number of MD source queue entries to be retired this cycle. The E%SQ_RETiKE_MD_H < 1 : 0 > signal 
decode is demonstrated in Table 7-22. 



Table 7-22: Source Queue Entries Retired 


E*SQ_RErmE_H<> # SQ Entries 


EScSQ_RETTRE_MD_H<> 


#MD SQ Entries 


1:0 Retired 


1:0 


Retired 


0 0 0 


0 0 


0 


0 1 1 


0 1 


1 


10 1 


1 0 


1 


11 2 


1 1 


2 



7.4.1.5 Specifier Bus Enable 

The OQU applies back-pressure to the IBU whenever there are insufficient MDs or source and 
destination queue entries to hold more operands. SPEC_CTRI«_ENABLE is driven by the OQU 
to enable the driving of specifier data on the specifier control bus. SPEC_CTRL_ENABLE, when 
asserted, allows the IBU to drive a specifier on SPEC_CTRL<21:0>. 

The number of available source queue entries, destination queue entries, and MDs determine 
whether a specifier may be parsed by the IBU and driven on the specifier control bus. 
SPEC_CTRL_ENABLE is asserted if there are at least 2 source queue entries, 2 destination queue 
entries, and 2 MDs available. 

7.4.1 .6 E%STOP_IBOX and Branch Mispredict 

The following actions take place when the Ebox issues a E%STOP_IBOX_H or a branch mispredict. 

* The MD allocation counter and index counter are both reset to 0 

• The source queue counter is reset to 0 

* The destination queue counter is reset to 0 

• Any specifiers currently being processed will not make a queue entry. 
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7.4.2 Complex Specifier Unit 

The Complex Specifier Unit (CSU) processes all specifiers with modes other than short literal 
or register. It receives parsed instruction stream data and parameters on the specifier control 
fields. Using a 32-bit, 3-stage pipelined datapath with microcode control, the CSU performs the 
register and memory data operations required to provide the Ebox with instruction operands. 
Final operand values are routed to the Ebox memory data registers. 

7.4.2.1 CSU Microcode Control 

The CSU microsequencer provides microcoded control for the 3-stage pipelined datapath. Under 
typical operation, a control store address is generated for the 128-entry X 29-bit control store array 
and a new microword is referenced every cycle. The complete microword depicted in Table 7—23 
is issued and forwarded to the subsequent pipeline stages in consecutive cycles in order to control 
the datapath logic in those stages. 

Figure 7-10: Microword Format 



67654311C?6"€5i32x0S87€ ... 0 




Table 7-23: Microword Fields 



field 


description 


ALILFNC 


controls the ALU function 


ML 


selects mem req data length = long or DL 


A 


IA_bus source 


B 


IB_bus source 


DST 


IW_bus destination 


MISC 


miscellaneous functions 


MREQ.FNC 


controls memory request function 


DECJ4EZT 


conditional control of decoder next 


NXXADDR 


full next microaddress field 



The 128-entry control store array is arranged as 8 pages of 16 microwords per page. Bits <6:4> 
of the control store address designate the microcode page, bits <3:0> designate the microword 
address within a page. The page organization places the microcode corresponding to a unique 
complex specifier flow within a particular page. 
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Table 7-24: Microcode Page Allocation 



page 


description 


000 


displacement flows, modes=A,C,E 


001 


displacement deferred flows, modes=B,D,F 


010 


auto increment flow, mode=8 


on 


auto increment deferred flow, mode=9 


100 


register deferred flow, mode=6 


101 


auto decrement flow, mode=7 


110 


IPR and utility routines, index flow, mode=4 


111 


Ebox assists, idle address 



The CSU specifier microcode processes VAX defined specifiers 4 and 6-F. These are the operand 
specifiers that the Ibox defines as complex. Displacement data will be sign extended by the IBU so 
the CSU can process byte, word and longword displacement specifiers in a longword microcode flow. 
Displacement deferred specifiers merge together in a similar fashion. Ebox assists are "implicit" 
operands in some of the "VAX opcodes. In order to simplify Ebox microcode to handle instruction 
execution only, the implicit specifiers are processed up front by the Ibox. These assists appear to 
the Ebox as typical complex operands. See Section 7.3.2.7 for more information on assists. 



7.4.2.2 CSU Pipeline 

The 3-stage CSU pipeline operates under microcode control during the SI, S2, and S3 stages of 
the Ibox pipeline. Control store address generation, control store lookup, and microword issue 
occurs in the SI stage. The datapath source busses are driven during the S2 pipeline stage. The 
S3 stage contains the ALU and write destination bus logic, and memory request logic. 

Ordinarily, microwords move through the pipeline synchronously, advancing every cycle. Stalls 
occur when a resource required for a particular pipeline stage is unavailable. Stalls operate 
synchronously and transparently to the microcode flow by freezing the sequence and the pipeline, 
thereby causing the CSU logic to repeat the operation performed in the previous cycle. The stall 
terminates upon acquisition of the resource which caused the stall and the pipeline flow returns 
to normal, advancing every cycle. 

7.4.2.2.1 SI Pipeline Stage 

The SI pipe latch, also called the dispatch latch, controls the SI pipeline logic. The SI pipe latch 
is loaded from the parsed instruction stream data and parameters shown in Table 7—25. 

S1_EN, S1_AT, Sl.DL, S1_DISPATCH, Sl_AT_EMW, SlJNDEXED, S1_ASSIST, and SlJPC_MODE load 
directly from the specifier control field and the specifier complex control field as driven by the IBU. 
The Sl_REG_INDEX loads from the MD_INDEX lines coming from the OQU. The 32-bit Si.IB.DAIA 
and Sl_IBOX_PC are loaded from SPEC_DATA<31:0> and EBU_PC<31:0> respectively. 
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Si_EXS_SCORE and Si_KXD_SCORE load from the entry in the SBU scoreboard array pointed to by 
the GPR number of the specifier. Si_BXS_SCOBE and Sl_EXD_SCORE represent "snapshot" values of 
the scoreboard, taken when a specifier dispatch enters the SI pipe latch. The scoreboard updates 
the value of these entries based on the Ebox retiring source and destination queue entries. See 
Section 7.4.3 for scoreboard details. The snapshot values decrement in parallel with the SBU 
values. 
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Table 7-25: S1 Pipe Latch 



Bit 



X" 16J.U 


fieio. name 


Description 


<3:0> 


S1.SN 


GPR number from the specifier. 


<6:4> 


S1_AT 


Access Type of the operand associated with the specifier. 


<8:7> 


S1_DL 


Data length of the operand associated with the specifier. 


<12:9> 


Sl_KXS_8COKE 


Value of scoreboard source queue counter indexed by GPR number. 


<15:13> 


S1.BXD.SCORE 


Value of scoreboard dest queue counter indexed by GPR number. 






VViiwUX DbilXTt? UioUd I'd J. BUUiCOOt 


<19> 


S1_AT_RMW 


Access Type of operand is R, M or W. 


<20> 


S1_INDEXHD 


The base specifier has an index specifier. 


<21> 


S1_A5S1ST 


Ebox assist specifier. 


<22> 


Sl_PC_MODE 


The specifier uses program counter addressing 


<25:23> 


S1_REG_INDEX 


Value of oqv MD allocation pointer. 


<57:25> 


SIJB.DATA. 


Data for Immediate and displacement mode specifiers. 


<S9:5S> 


Sl_IBOX_PC 


The PC of the next Istream byte following this specifier. 


<90> 


Sl.VAIJD 


SI pipe latch valid bit. 


<91> 


SI_JMP_OR_JSB 


Indicates whether the instruction was JMP or JSB. 



The Si_VALID bit indicates that the SI pipe latch contains valid dispatch arguments waiting to 
be serviced. The CSU recognizes the availability of the valid complex dispatch, and performs the 
control store access. The microword is issued in SI and loaded into the S2 pipe latch. The CSU 
sets Sl_VALH) when a complex specifier is parsed by the LBU and doesn't advance to stage S2 
the following cycle. This is a result of a Si_STALL. The SI logic clears S1_VAIJD upon successful 
transition of the SI microword into the S2 pipe latch. The clear SlJVAIJD bit indicates the 
availability of the SI pipe stage for a new complex specifier dispatch next cycle. 

The Si.STALL condition occurs when the SI context latch cannot be loaded immediately into 
the S2 pipe latch. This condition may occur during an S2J5TALL, when I_DBU%QUAD_FLAG_H<o> is 
asserted, or a multiple microword flow. S2 J5TALL indicates that the S2 pipe latch cannot currently 
advance (see Section 7.4.2.2.2 for more details on the S2_STAJLL). Naturally this stall ripples back 
to become an Sl_STALL as well because the SI microword cannot advance into the S2 pipe latch. 
I_D5U%QUAD_FLAG_H<0> indicates the LBU is waiting for the second longword of a quadword 
immediate mode specifier. Once the second longword is retired, I_IBU%QUAD_FLAG_H<0> is 
de-asserted and the CSU is allowed to process the quadword immediate mode specifier. During 
multiple microword flows, the next control store address is generated from the microword in the 
S2 pipe latch. Consequently, the SI pipe latch may accept one dispatch from the LBU which sets 
Sl_VALED. The dispatch in the SI pipe latch is then in the S1_STALL condition waiting for service. 

The LBU uses Sl_VALED as part of the parser enable equation. If Sl_VALED is clear then the LBU 
may parse a complex specifier and retire the instruction stream from the PFQ. If Sl_VALID is set 
then if the LBU parses a complex specifier it cannot retire the instruction stream because the SI 
pipe latch cannot accept the dispatch. The LBU stalls the parser such that the same specifier is 
parsed in subsequent cycles. 



DIGITAL CONFIDENTIAL 



Thelbox 7-43 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Typical microcode flows begin at a microcode address determined by a complex specifier dispatch. 
A DECODER_NEXT directive in the S2 pipe latch tells the microsequencer that the next microcode 
address is not related to the current flow. If Si_VAIJD indicates a valid dispatch waiting in the SI 
pipe latch and the S2 pipe latch contains a DECODER_NEXT, then the microsequencer selects the 
SI pipe latch as the source of the next microaddress. This begins a new microcode flow for the 
specifier being dispatched. The microcode sequences through a flow using microaddress jumps. 
A jump selects the NXT_ADDR<6:0> field of the microword in the S2 pipe latch directly for the 
next microword address. The final microword of each flow contains a DECODER_NEXT which once 
again requests a new dispatch address. 

Requests for IPR references which are detailed in Section 7.4.2.8 must guarantee that the CSU 
is idle. Thus, whenever the SI logic detects an IPR read strobe from the Ebox, then the next 
microaddress is selected by the IPR number. The request immediately dispatches to the utility 
microcode page. 

The unwind_mispredict routine is selected when the Ebox signals a branch mispredicted. The 
RLOG unwinds restoring the GPRs until the RLOG is empty, then the Ibox is restarted. 

The CSU dispatches to the common entry point for the single microword index routine when the 
dispatch number of a specifier indicates that it is an index. The index register is read from the 
Ebox and shifted by length = DL. 

The microaddress control selects the IDLE address when no valid dispatch or utility dispatch 
awaits processing. The IDLE microword simply jumps to its own address and executes the 
DECODER_NEXT directive, awaiting a valid dispatch. 

In addition to the standard DECODER_NEXT directive, the microcode and next address logic 
supports a conditional DECODER.NEXT The DECODER_NEXT_IF_BWL performs a standard 
DECODER_NEXT if the data length associated with the specifier is byte, word, or longword. For 
quadword data length the next address logic performs a microaddress jump. 

The microcode and next address logic supports one conditional jump. The 
BRANCH_IF_RLOG_EMPTY directive causes the next microaddress logic to perform a standard jump, 
but in addition the logic OR function of a 1 and the next microaddress bit <0> is performed if 
the RLOG is empty. The RLOG unwind microcode uses this conditional jump feature. A single 
microword jumps to itself as long as the RLOG still has valid entries. When the RLOG empties, 
the microword conditionally jumps out of the loop. See Section 7.4.2.3 for RLOG details. 

The SI logic uses a five-input multiplexer to select the source of the next control store address. 
Both the complex specifier multiplexer input and Ebox assist multiplexer input use data from the 
SI pipe latch to form the next address. The DPR multiplexer input uses the latched IPR number 
from the Ebox, to select which IPR type field will be used to form the next address. The next 
address field from the S2 microword enters another multiplexer input in order to perform the 
microaddress jump. The final multiplexer input is the idle address. Next address generation is 
summarized by Table 7—26. 
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Table 7-26: Next Address Generation Fields 



bit 
field 


field name 


description 


Specifier Dispatch 


<0> 




forced to 0 


<1> 


Sl.INDEXED 


index specifier 


<2> 


Sl_PC_MODE 


base register is the PC 


<3> 


S1_AT_RMW 


access type = read,modify, or write 


<6:4> 


Sl.DISPATCH 


si_dispatch<sio> field from the qu 


Assist Dispatch 


<0> 




forced to 0 


<3:1> 


S1_DISPATCB<2K)> 


assist type 


<6:4> 




forced to 111, assist page number 


EPR and Utility Dispatch 


<0> 




forced to 0 


<3:1> 


000 


index routine 


<3:1> 


001 


ipr unwind RLOG read back-up PC 


<3:1> 


010 


E%BRANCHJAISPBEDICT_L 


<3:1> 


Oil 


ipr read 


<6:4> 




forced to 110, iPR/utility page number 


Idle Dispatch 


<6:0> 




forced to 1111111, idle address 


Next Address 


<0> 


NXTADDB 


next address field from the sslmicroword. For conditional jump OR in 1 if 
RLOG is empty 


<6:1> 


NXT.ADDE 


next address field from the ssmicroword 



7.4.2.2.2 S2 Pipeline Stage 

The S2 pipe latch controls the S2 pipeline datapath. Each cycle, the S2 pipe latch attempts to 
load a microword and specifier specific parameters from the instruction stream. The S2 pipe latch 
is shown in Table 7-27. 
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Table 7-27: S2 Pipe Latch 



Bit 



Field 


Field Name 


Description 


<3:0> 


S2_RN 


GPR number from the specifier. 


<6:4> 


S2_AT 


Access lype of the operand associated with the specifier. 


<8:7> 


S2_DL 


Data length of the operand associated with the specifier. 


<11:9> 


S2_RBG_INDKX 


Current value of S2 MD allocation pointer or WX index. 


<15:12> 


82_RXS_SCOEE 


Value of scoreboard source queue counter indexed by GPR number. 


<18:16> 


S2_RXD_SCOHE 


Value of scoreboard dest queue counter indexed by GPR number. 


<47:19> 


S2_MICROWORD 


The microword issued in SI. 


<48> 


S2_NEW_FLOW 


Indicates the first microword of a flow. 


<49> 


S2_JSB_OR_JMP 


Indicates whether the instruction was JMP or JSB. 



S2_RN T , S2_AT, S2_DL, S2_JSB_OR_JMP, S2_BXS_SCORE, and, S2_RXD_SCORE load directly from the 
SI pipe latch. S2_RXS_SCORE and S2_RXD_SCORE decrement in parallel with their corresponding 
SBU values. S2_REG_INDEX typically loads directly from Si_REG_INDEX, however, if the dispatch 
is for an IPR read, it loads a copy of W^JNDEX from the Ebox. 

The S2_MICROWORD field of the S2 pipe latch updates from the microword issued by the SI pipe 
stage. During an initial specifier dispatch, all of the S2 pipe latch updates. Bits <48:19> of 
the latch update every cycle, assuming no stalls. However, bits <49,18:0> of the latch remain 
constant throughout the context of one specifier flow, except for local scoreboard decrements of 
S2JRXS.SCORE and S2_RXD_SCORE. This part of the S2 pipe latch does not load again until another 
dispatch occurs. This allows for multiple microword flows within the context of a given specifier. 

S2_NEW_FLOW indicates that contents of the S2 pipe latch represents the first microword of a new 
dispatch. In other words, the microword address for the microword in S2 was generated in any 
manner other than a microaddress jump. This pipe bit aids the S3 stage in loading the specifier 
context portion of the S3 latch. See section Section 7.4.2.2.3 for details. 

The S2 datapath contains the CSU register set and constant generator. The CSU ALU source 
busses, the IA_bus and IB_bus, are controlled by the microcode /A and /B fields to drive the 
source busses in the S2 pipeline stage. The CSU microcode may also requests an Ebox GPR to 
source the IA_bus by providing the I%EBO3LIA w ADDRJB<3:0> from the S2_RN field of the S2 pipe 
latch. The Ebox register read is strobed with I%IBOX_IA_READ_H. The Ebox returns GPR data 
later that cycle on the E%IBOX W IA W BUS_H<31:0> lines. This provides a path for the CSU to obtain 
the base specifier register of the operand currently being processed. When the S2_microword is 
sourcing a GPR which is identical to the S3_microword destination register, the IW_BUS will be 
driven onto the source bus, bypassing the GPR read. 

Tabie 7-28: CSU Registers 

Register Available Written Description 
Name On From 

too IA,IB IW temporary register 
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Table 7-28 (Cont.): CSU Registers 



Registrar 


Available 


Written 


Description 


Name 


On 


From 




D3.DA3A 


IA.IB 


SPECJDATA immediate and displacement data 


KX 


IA 


IW 


base specifier register 


IMD 


IA 


MD 


Ibox memory data 


MD 


• 


IW 


Ebox memory data register 


WX 




IW 


Ebox working register 


SDL 


IB 




1 for DL=Byte, 2 for Word, 4 for LONG, 8 for QUAD 


ZDL 


IB 




1 for DL=Byte, 2 for Word, 4 for LONG, 8 for QUAD 


E4 


IB 




Constant 4 


SI2 


IB 




Constant 12 


KLOG_RX 


IA 


IW_BUS 


Register pointed to by top of RLOG 


RLOG.EDL 


IB 




Same as KDL except using DL from top of RLOG stack 


IBOX_PC 


IA 


IBU.PC 


PC of instruction byte following last byte in specifier 



To is a temporary register for microcode use. D3_DATA and D30X_PC are the S2 pipeline copies 
of Sl_D3_DATA and SiJBOXJPC respectively. D3.DATA and D30X_PC are loaded along with the 
S2_PIPE_LATCH<18:0> on the first microword of a dispatch. Then the CSU microcode maintains 
control of these registers throughout the context of a given specifier flow. 

RX refers to the Ebox GPR register indexed by S2_RN. RLOG.RX refers to the Ebox GPR register 
indexed by the RLOG_RN. See Section 7.4.2.3 for more details. MD addresses the Ebox MD register 
indexed by S2_REG_INDEX. WX points to the Ebox working register also indexed by S2_REG_INDEXL. 
E4 and K12 are constants. KDL is a constant based on S2_DL. The value of the constant is 1 for 
DL=0 (byte), 2 for DL=1 (word), 4 for DL=2 (longword), and 8 for DL=3 (quadword). DDL is a 
constant based on S2_DL for immediate mode specifier with access type A or V. IDL differs from 
KDL in the fact that the constant value is 4 for DL=3 (quadword). RLOG_KDL is a constant similar 
to KDL, but based on RLOG.DL. See Section 7.4.2.3 for more details. 

For a majority of memory requests started by the CSU microcode, the Ibox memory data returns 
to the IMD register. The Mbox drives M%D30X_DATA_L when M%MD_BUS_H<31:0> contains valid 
data from a specifier memory request. The IMD has a signal IMD_VAUD associated with it. Each 
time the CSU microcode initiates a memory request IMDJVALID is set. Each time memory data 
returns to IMD, IMDJVALID is reset. 

When M%MME_FAULT_H or M%HAKD_ERR_H is asserted by the Mbox along with M%EBOX_DATA_L, 
this indicates that Ibox data on M%MD_BUSJB<63:0> is invalid and that the corresponding 
reference was associated with either a memory management exception or a hard error condition. 
In both cases the CSU continues to process the specifier, but sets flags indicating the IMD contains 
invalid data. The flags are reset at the end of each specifier flow. They are forwarded to stage 
S3 whenever the IMD is selected to source the IAJbus. They are called I%FORCE_MME_FAULT_H 
and I%F0RCE_HARD_FAUX.T_.il When set they indicate to the Ebox and Mbox that the associated 
register write or Ibox reference should be forced to "look" like a memory management fault or a 
hardware fault from the Ibox point of view. 
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The S2 pipeline stage stalls for three reasons: GPR destination queue stall (RXDJ5TALL), 
Ibox memory data stall (IMD_STALL) and S3J5TAJLL. RXD_STALL occurs when the CSU microcode 
attempts a read of a GPR for which there exist outstanding writes in the Ebox destination 
queue. The S2 pipeline logic detects RXD.STALL when S2_RXD_SCORE does not equal 0, and the 
S2.MICROWORD attempts to read the GPR from the Ebox indexed by S2_EN. The stall breaks when 
the Ebox retires a destination queue entry that causes both the SBU counter and the snapshot 
S2_EXD_SCORE to decrement. Multiple destination queue entries may have to be retired, causing 
multiple decrements, before S2_RXD_SCORE equals 0. 

EWD_STALL occurs when the S2_MICROWORD attempts to read the IMD when IMD.VAIJD is set. This 
condition implies that a memory request was initiated by CSU microcode which set IMDJVALTJD, 
but memory data which resets the signal has not yet been returned. IMD_STALL can only happen 
in the context of one complex specifier flow when the Ibox requests then waits for memory data 
to be returned to IMD. 

S2_STALLS block the S2 pipeline latch update, causing the S2 stage to execute the same stalled 
MICROWORD until the stall breaks. If an S2 stall occurs, not resulting from a S3 stall, the S3 
pipeline latch continues to updates; however, NOPs are fed into the S3 pipeline latch while the 
S2 stall is in progress. When the stall breaks, the pipeline latches resume normal operation. 

7.4.2.2.3 S3 Pipeline Stage 

The S3 pipe latch controls the S3 pipeline datapath. Each cycle, the S3 pipe latch attempts to 
load a microword and the specifier-specific parameters from the instruction stream. The S3 pipe 
latch is shown in Table 7—29. 



Table 7- 


-29: S3 Pipe Latch 




Bit 






Field 


Field Name 


Description 


<3:0> 


S3_RN 


GPR number from the specifier. 


<6:4> 


S3_AT 


Access Type of the operand associated with the specifier. 


<8:7> 


S3_DL 


Data length of the operand associated with the specifier. 


<11:9> 


S3_KJKG_INDBX 


Current value of S3 MD allocation pointer or WX index. 


<15:12> 


S3_BXS_SCOBE 


Value of scoreboard source queue counter indexed by GPR number. 


<46:16> 


S3_MICROWOKD 


The microword issued in SI. 


<47> 


S3_JSB_OR_JMP 


Indicates whether the instruction was JMP or JSB. 



S3.RN, S3_AT, S3_DL, S3_REG_ENDEX, S3_JSB_OR_JMP, and S3_EXS_SCORE load directly from the S2 
pipe latch. S3_RXS_SCORE decrements in parallel with its corresponding SBU value. When S3 
logic initiates a memory reference with an MD destination, S3_REG_INDEX specifies the index into 
the MD register array for the memory data write. Such memory requests cause MD_INDEX to 
increment modulo the size of the MD register file, so that the data for quadword operands, which 
require two memory requests, occupy successive MD registers. 

The S3.MICROWORD field of the S3 pipe latch updates from the S2_MICROWORD. During the first 
instruction of a specifier dispatch flow, as indicated by the contents of S2_NEW_FLOW, all of the S3 
pipe latch updates. The microword field in bits <46:16> continues to update every cycle, loading 
the new microword from S2. However, bits <47,15:0> of the latch remain constant throughout 
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the context of one specifier flow, except for local scoreboard decrements of S2_EXS_SCORE, and 
local increments of S3.BEG.INDEK. This part of the S3 pipe latch does not reload until another 
dispatch occurs, allowing for multiple microword flows within the context of a given specifier. 

The S3 datapath contains the CSU ALU and register write logic. The ALU maintains 32-bit input 
latches which load the IAJBUS and IB_BUS during an S3 pipe latch update. Under control of the 
microcode /ALU.FNC field the ALU performs 32-bit add, subtract, pass, and left bit-shift equal to 
S2_DL. The destination bus, IW_bus, provides the path to write the ALU results to one of the CSU 
registers under control of the microcode /DST field. The IW_BUS_bus can also be selected to write 
to the Ebox GPR, MD, and working (WX) registers. The I%IBOSLIW_BUS_H<31:0> lines are driven 
from the ALU output, and the S3_RN field of the S3 pipe latch provide I%IBOX_rw_ADDR_H <4 : 0> 
as an index into the GPR array. MD and WX writes both use the S3_REG_INDEX field of the S3 
pipe latch to provide I%roox_IW_AI>DRJ3<4:0> as an index into the Ebox register array. The Ebox 
register write is strobed with I%IBOX_IW_WRITE_H 

The S3 stage logic initiates CSU memory requests based on the SS_MICROWORD . Along 
with a memory request command, the full 32-bit address is sent to the Mbox on the 
I%IBOX_ADDR_H<31:0> lines. These lines may be sourced from either the IA_BUS or IWJBUS, 
under the S3_MICROWORD /MREQ field control. If microcode selects the L4._BUS for memory 
request address, the S3 pipe latch for the IA_BUS sources the address. The S3 logic also forwards 
VTC_REQ from VIC Istream requests to the Mbox when there are no specifier memory requests in 
the S3_MICROWORD. In this case, the KeIBOX_ADDR_H<31:0> is sourced by VIC_REQ_ADDR from 
the VIC. 

The following control signals accompany I%IBOX_ADDRJH<31:0>. I9cIBOX_CMD_L<4:0> indicates 
reference type to the Mbox. See Section 12.3.1 in Chapter 12 for valid values. I%rBOX_TAG_L<4:0> 
contains the Ebox register file destination of a memory request, a copy of S3_REG_ENDEX. 
l%rBOX_AT_L<l:0> and I%EBOX_DL_L<1:0> provide the Mbox with the access type and data 
length. I%IBOX_ i AT_L<l:0> is either a copy of S3_AT or forced to read or write depending on 
control of the microcode /MREQ field. KS>IBOX_DL_L<1:0> is either a copy of S3_DL or forced to 
longword depending on control of the microcode /ML field. I%BBOX_REF_DEST_L<1:0> specifies 
the destination for memory request data. I%EBOX w REF_DEST_L<l> indicates that the Ebox MD 
registers are the destination. I%IBOX_REF_DEST_L<0> indicates that the Mbox EMD register is 
the destination. This field is decoded from the S3_MICROWORD memory field. The I%SPEC_REQ_H 
strobe is asserted for CSU specifier memory requests. The I%EREF_REQ_H strobe is asserted for 
VIC Istream memory requests. 

For JMP, JSB, and certain Ebox assists, the S3 logic sends requests to the BPU to load a new 
PC. The PC value may be sourced from either the I%mox_IW_BUS_H<31:0> or M%MD_BUS_H<31:0> 
under S3_MICROWORD /MISC field control, as indicated by LD_PC_WBUS or LD_PC_MD respectively. 

The S3 pipeline stage stalls for three reasons: GPR source queue stall (RXS_STALL), memory 
request stall (MRQ_STALL), and (RLOG_STALL). RXS_STALL occurs when the CSU microcode 
attempts to write a GPR destination for which there exist outstanding read in the Ebox source 
queue. The S3 pipeline logic detects RXS_STALL when S3_RXS_SCORE does not equal 0, and 
the S3JHICROWORD attempts to write the GPR in the Ebox indexed by S3JRN. The stall breaks 
when the Ebox retires a source queue entry that causes both the SBU counter and the snapshot 
S3_RXS_SCORE to decrement. Multiple destination queue entries may have to be retired, causing 
multiple decrements, before S3_RXS_SCORE equals 0. 

RXOGJ5TALL occurs when RLOGJPULL is asserted and the microword in the S3 pipe requests a 
GPR write. The stall effect is exactly the same as RXS_STALL. The stall breaks when the Ebox 
retires an instruction which in turn relinquishes RLOG resources. 
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MRQ_STALL occurs when the S3.MICROWORD attempts a memory request but the 
M%SPEC_Q_FULL_H signal from the Mbox indicates that the request cannot be accepted. 

S3_STALLS block the S3 pipeline latch update, causing the S3 stage to execute the same 
stalled MICROWORD until the stall breaks. S3.STAJLLS also back-stall the S2 stage, in effect 
causing S2.STALL which blocks the S2 pipeline latch update. Both pipeline stages execute their 
respective stalled microwords until the stall condition breaks, allowing successful completion of 
the microword. The pipeline latches then continue to update as usual. 

RXS_STALL does not block the initiation of a memory request by the S3_MICROWORD . In other 
words, if the S3JMICROWORD indicates a memory request operation and no MRQ_STALL or 
RLOG.STALL exists, the request is initiated regardless of RXS_STALL. This somewhat de-coupled 
operation of the S3_STALLS breaks possible macroinstruction deadlocks due to the R0 (R0)+ case. 
While processing the specifier (R0)+ the CSU microcode performs a write to the GPR RO. A 
RXS_STALL will hold until the Ebox retires the first source, RO. The Ebox must retire two source 
operands at a time, and therefore cannot retire the RO specifier until the MD for the second 
specifier is valid. 

The converse case, whether MRQ_STALL blocks a register write, is not an architectural or 
performance issue. This implementation blocks register writes during an MRQ_STALL. 
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Figure 7-12: Complex Specifier Unit Data Path Block Diagram 
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7.4.2.3 RLOG 

The register log or RLOG allows the Ibox to restore the state of the GPRs under certain exception 
conditions. Because of the pipeline organization, the Ibox works on macroinstructions ahead of 
the Ebox execution. Any or all of six possible operand specifiers for any distinct macroinstruction 
may be auto-increment or auto-decrement mode, which by definition modify the GPRs. The Ibox 
must log all modifications to the GPRs for these operand specifiers and keep the log until the 
Ebox has retired the associated instruction. If the instruction stream gets redirected due to a 
branch or exception, then the Ibox uses the RLOG to restore the GPR registers to the condition 
expected at the time of the redirection. 

The RLOG is an 8-entry circular queue with read and write pointers. Each entry is composed of 
7 bits, 4 bits contain the GPR number, 2 bits specify DL, and 1 bit indicates auto-increment or 
auto-decrement. 

Elements are added to the RLOG under control of the S3_MICROWORD /DST field. When the 
microword specifies a register log operation, then S3_RN, S3JDL, and the encoded /ALU.FNC are 
entered in the RLOG entry pointed to by the write pointer. The write pointer is then incremented 
modulo 8. If the RLOG write pointer reaches the state in which another increment causes the 
write pointer to equal the read pointer, then the RLOG is full. The RLOG full condition may 
cause an RLOG_STALL as described in Section 7.4.2.2.3. 
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The RLOG only contains specifier state for macroinstructions which the Ebox has not executed. 
When the Ebox retires a macroinstruction, the RLOG discards RLOG entries associated with 
that macroinstruction, by advancing the RLOG read pointer. The RLOG_BASE_POINTER and 
RLOG_BASE_QUEUE provide the means for read pointer advancement. 

The RLOG_BASE_POENTER increments anytime a valid auto-increment address mode specifier, 
auto-decrement address mode specifier, auto-increment assist, or auto-decrement assist appears 
on SPEC_CTRL. In effect, the RLOG_BASE_POINTER allocates RLOG spaces for the CSU to make 
subsequent entries. The RLOG_BASE_POINTEE is loaded into the 6-entry RLOG_BASE_QUEUE 
each time a new PC is loaded into the PC_QUEUE. The RLOG_BASE_QUEUE thus maintains an 
RLOG read pointer for every PC in the PC_QUEUE. The RLOG_BASE_QUEUE and the PC.QUEUE 
both retire entries when the Ebox asserts E%RETTRE_INSTR_L indicating that it has retired a 
macroinstruction. The RLOG read pointer loads the value of the next RLOG_BASE_QUEUE entry 
at this time. 

The CSU microcode controls the RLOG unwind procedure. RLOG unwind consists of repeatedly 
executing a microword that updates the GPR registers based on indirect references to RLOG_RN, 
RLOG_DL, and RLOGJFUNC. The RLOG supplies the values for the indirect references from the 
entry pointed to by the read pointer. This entry is retired by incrementing the read pointer. The 
RLOG retires successive entries until the read pointer is equal to the write pointer, then the RLOG 
is empty. At this point the unwind procedure completes and the RLOG is flushed by resetting the 
RLOG read and write pointers, the RLOG_BASE_POEsTER. and the RLOG_BASE_QUEUE read and 
write pointers. If the RLOG is empty when the microcode initiates an unwind, 0 will be added 
to whatever GPR is pointed to by the read pointers. 

7.4.2.4 Branch Mispredict effects 

When the Ebox asserts E%BBANCH_MISPREDICT_L, the NOP microword is forced into the S3 
pipeline stage, the SI pipe latch valid bit is cleared, and the next microaddress logic selects 
the MISPREDICT.UNWIND utility routine address. The microcode at this location unwinds the 
RLOG and then restarts the Ibox. If the RLOG is empty when the microcode initiates an unwind, 
0 will be added to whatever GPR is pointed to by the read pointers. Note that the RLOG is NOT 
flushed on the assertion of E%BRANCH_MISPREDICT_L. It needs to remain intact to be unwound 
by CSU microcode. 

IMD_VALID is reset upon the assertion of E%BRANCHJMISPREDICT_L. 

7.4.2.5 E%STOPJBOX Effects 

When the Ebox asserts E%STOP_rBOX_H, the microsequencer jams the CSU to the idle state, except 
in the case when the CSU is in the middle of IPR transaction unwind RLOG/read back-up PC. 
In this situation, the RLOG will unwind until completion, and the read of the back-up PC will 
be disabled. The CSU is put into the idle state by forcing NOP microwords into the S2 and S3 
pipeline stages, clearing the SI pipe latch valid bit, and selecting the IDLE microaddress. 

7.4.2.6 RSVD_ADDR_FAULT effects 

When I%RSVD_ADDR_FAULT_H is asserted for a complex specifier the SI pipe latch valid bit is 
cleared. If there isn't a SI stall the NOP microword is forced into the S2 pipeline stage. Complex 
specifiers already in the CSU pipeline when I%RSVD_ADDR_FAULT_H is asserted are allowed to 
finish processing. 
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7.4.2.7 CSU Microcode Restrictions 

The CSU microcode must guarantee, for all auto-increment, auto-increment deferred, and 
auto-decrement specifier microcode flows, that any specifier memory requests destined for the 
MD is issued before or during the microword that modifies the GPR. Otherwise, it is possible for 
the CSU to infinitely stall due to an KXS.STALL. This is evident in the case ADDL2 R0,@(R0)+ 
where the Ebox must retire two source operands, and therefore cannot retire the R0 specifier 
until the MD for the second specifier is valid. The CSU microcode must also guarantee, for all 
auto-increment, auto-increment deferred, and auto-decrement specifier microcode flows, that the 
microword which initiates the memory request destined for the MD must have the misc field 
stall_if_rlog_full if the following microword modifies the gpr. 

The CSU microcode must guarantee, for all auto-increment, auto-increment deferred, 
auto-decrement and auto-decrement deferred specifier microcode flows with access type AV, that 
the microword which writes the MD is immediately followed by the microword that modifies the 
gpr. This, in conjunction with an EBOX microcode restriction, is necessary in order to prevent 
an infinite RXS stall from occurring. 

The CSU microcode must guarantee that memory requests which specify the Ibox IMD as the 
data destination, are used only for deferred operand evaluation. For a microword with a [IMD] 
source, the previous microword must initiate the memory request with destination IMD and must 
not perform a GPR write and not have the misc field stall_if_rlog_full. All this is necessary to 
protect the use of an unconditional MD latch in the CSU datapath. 

7.4.2.8 Ibox IPR Transactions 

The Ebox microcode communicates with the Ibox in part through internal processor registers 
(IPRs). The IPR reads are handled by CSU microcode. The IPR write control is distributed, however 
the description is included here for completeness. 

Ebox microcode conventions guarantee that the Ibox is idle before initiating Ibox EPR transactions. 
This is accomplished either by the knowledge that the current Ebox microcode flow takes place in 
a macroinstruction with an drain Ibox assist or by asserting an explicit E%STOP_IBOX_H command. 
The only exception involve the issuing of an IPR transaction when the CSU is involved in an RLOG 
unwind operation. In this case the unwind finishes in the CSU, then the CSU processes the latched 
IPR command. If the RLOG is empty when the microcode initiates an unwind, 0 will be added to 
whatever GPR is pointed to by the read pointers. 

MICROCODE RESTRICTION 

E%EBOX_LOAD_PC_L and E%IBOX_IPR_WRrrE_H must not occur in the same cycle. 

7.4.2.8.1 IPR Reads 

The Ebox signifies an IPR read by asserting the E%EBOX_IPR_READ_H strobe, the 
E%IBOX_IPR_TAG_H<2:0>, and the E%IBOX_IPR_NUM_H<3:0>. This information is latched in the 
SI logic stage, and an IPR request flag is posted. The SI next address logic responds by creating 
an IPR dispatch to an IPR microaddress in the utility page of microcode, and by clearing the IPR 
request flag. All Ibox logic blocks associated with IPR reads examine the E%IBOX_IPR_TAG_H<2:0>. 
If the IPR source is within a section, that section prepares to drive the IPR read data onto 
the VIC_REQ_ADDR. The microcode at the common IPR routine reads the VIC_REQ_ADDR, passes 
the value through the ALU, and writes the data to an Ebox working register located at the 



DIGITAL CONFIDENTIAL 



The Ibox 7-53 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



E%EBOX_IPR_NUM_H<3 :0> offset in the register array. The VIC_REQ_ADDR is used for IPR read 
data source simply because it is a convenient 32-bit bus that runs through the entire section. 

7.4.2.8.2 IPR Writes 

The Ebox signifies an IPR write by asserting the E%m03LIPR_WRITE_H strobe and the 
E%EBOX_IPR_TAG_H<2:0>. All Ibox logic blocks associated with DPR writes examine the 
E%BBOX_EPR_IAG_H<2:0>. If the IPR destination is within a section, that section prepares to accept 
the IPR write data from the M%MD_BUSJB<63:0>. The Mbox drives the M%MD_BUS_H<63:0> with 
IPR data and asserts M%IBOX w IPR_WR_H to complete the transaction. 

7.4.3 Scoreboard Unit 

The Scoreboard Unit (SBU) keeps track of the number of outstanding references to GPRs in the 
source and destination queues. The SBU contains two arrays of 15 counters: the RXS_ARRAY for 
the source queue and the RXD.ARRAT for the destination queue. The counters in the arrays map 
one-to-one with the GPRs. There is no scoreboard counter corresponding to GPR 15, the PC, 
because RMODE operations to the PC are unpredictable. The maximum number of outstanding 
operand references determines the maximum count value for the counters. This value is based 
on the length of the source and destination queues. The RXS_ ARRAY counts up to 12 and the 
RXD_ARRAY counts up to 6. 

Each time valid register mode source specifiers appear on SPEC_CTBX<13:0>, the RXS_ARRAY 
counters that correspond with those registers are incremented. At the same time, the OQU 
inserts entries pointing to these registers in the source queue. In other words, for each register 
mode source queue entry, there is a corresponding RXS_ARRAY counter increment. This implies a 
maximum of 2 counters incrementing each cycle when a quadword register mode source operand 
is parsed. Each counter may only be incremented by 1. When the Ebox removes the source queue 
entries, the counters are decremented. The Ebox removes up to 2 register mode source queue 
entries per cycle as indicated on E%SQ_RLTlKE_RMODE_H < 1 :0 > . The GPR numbers for these 
registers are provided by the Ebox on E%SQ_RETIRE_RNl_H<3 :0> and E%SQ_RETIRE_RN2_H <3 :0 > . 
A maximum of 2 counters may decrement each cycle, or any one counter may be decremented by 
up to 2, if both register mode entries being retired point to the same base register. 

In a similar fashion, when a new register mode destination specifier appears on SPEC_CTRL<13:0>, 
the RXD_ARRAY counter that corresponds to that register is incremented. A maximum of 2 
counters increment in one cycle for a quadword register mode destination operand. When the 
Ebox removes a destination queue entry, the counter is decremented. The Ebox indicates removal 
of a register mode destination queue entry on E%DQ_REHEE_KMODE_H. The GPR number for the 
register is provided by the Ebox on E%DQ_RETIRE_RN_H<3:0>. 

Whenever a complex specifier is parsed, the GPR associated with that specifier is used as an index 
into the source and destination scoreboard arrays and snapshots of both scoreboard counter values 
are passed to the CSU on RXS_SCORE<3:0> and RXD_SCORE<2:0>. The CSU stalls if it needs to read 
a GPR for which the destination scoreboard counter value is non-zero. A non-zero destination 
counter indicates that there is at least one pointer to that register in the destination queue. This 
means that there is a future Ebox write to that register and that its current value is invalid. 
The CSU also stalls if it needs to write a GPR for which the source scoreboard counter value is 
non-zero. A non-zero source scoreboard value indicates that there is at least one pointer to that 
register in the source queue. This means that there is a future Ebox read to that register and 
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its contents must not be modified. For both scoreboards, the copies in the CSU pipe are locally 
decremented on assertion of the retire signals from the Ebox. 

7.4.3.1 E%STOP_IBOX and Branch Mispredict PC Load Effects 

Whenever a branch mispredict PC load occurs, or the Ebox issues a E%STOP_IBOX_H, all scoreboard 
array counters are cleared. 

7.5 Branch Prediction 

The Branch Prediction Unit (BPU) monitors each instruction opcode as it is parsed, looking for 
a branch opcode. Upon identification of a branch opcode, the BPU predicts whether or not the 
branch will be taken. If the BPU predicts the branch will be taken, it adds the sign extended 
branch displacement to the current PC and broadcasts the resulting new PC to the rest of the 
Ibox on the NEW_PC lines. 

7.5.1 Branch Prediction Unit 

7.5.1 .1 The Branch Prediction Algorithm 

The BPU uses a "Branch History" algorithm for predicting branches. The basic premise behind 
this algorithm is that branch behavior tends to be patterned. If one looks in a program at one 
particular branch instruction, and traces over time that instruction's history of branch taken vs. 
branch not taken, in most cases a pattern develops. Branch instructions that have a past history 
of branching seem to maintain that history and are more likely to branch than not branch in 
the future. Branch instructions which follow a pattern such as branch, no branch, branch, no 
branch etc., are likely to maintain that pattern. Branch history algorithms for branch prediction 
attempt to take advantage of this "branch inertia". 

The NVAX branch prediction unit uses a table of branch histories and a prediction algorithm based 
on the past history of the branch. When the BPU encounters a conditional branch opcode, a subset 
of the opcode PC bits is used to access the branch history table. The output from the table is a 
4 bit field containing the branch history information for the branch. From these 4 history bits, a 
new prediction is calculated indicating the expected branch path. 

Many different opcode PCs map to each entry of the branch table because only a subset of the PC 
bits form the index. When a branch opcode changes outside of the index region, the history table 
entry that it indexes may be based on a different branch opcode. The branch table relies on the 
principle of locality, and assumes that, having switched PCs, the current process operates within 
a small region for a period of time. This allows the branch history table to generate pertinent 
history relating to the new PC within a few branches. 

The branch history information consists of a string of l's and 0's indicating what that branch did 
the last four times it was seen. For example, 1100, read from right to left, indicates that the last 
time this branch was seen it did not branch. Neither did it branch the time before that. But then 
it branched the two previous times. The prediction bit is the result of passing the history bits 
that were stored through logic which predicts the direction a branch will go given the history of 
its last four branches. 
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The prediction algorithm is accessible via IPR for software programming and testability reasons. 
After power-up, the Ebox microcode initializes the branch prediction algorithm segment of the 
BPCR register with an algorithm which is the result of extensive simulation and statistics 
gathering. While it would be possible to create a program for which this prediction logic is 
wrong all the time, on the average it does very well. This algorithm is shown in Table 7—30. The 
BPCR is discussed in greater detail in Section 7.5.1.8. 

7.5.1 .2 The Branch History Table 

The 512 entries in the branch table are indexed by the opcode PC<8:0>. Each branch table entry, 
as depicted in Figure 7—13, contains the previous four branch history bits for branch opcodes at 
this index. The Ebox asserts E%FLUSH_BPT_H under microcode control during process context 
switches. This signal resets all branch table entries to a neutral value: history = 0100. This will 
result in a next prediction of 0. 

MICROCODE RESTRICTION 

E^cFLUSH_BPT_H may only occur while the Ibox is stopped. E%FLUSH_BPT_H must be 
asserted before the first branch is executed. 

Figure 7-13: Branch Table Entry Format 



3 2 10 

H + + 4- 1- 

(mcsr recent) 



7.5.1.3 Branch Prediction Sequence 

When the BPU encounters a conditional branch opcode it reads the branch table entry indexed by 
PC<8:0>. If the prediction logic indicates the branch taken, then the BPU sign extends and adds 
the branch displacement supplied by the IBU to the current PC, and broadcasts the result to the 
Ibox on the NEW_PC lines. If the prediction bit indicates not to expect a branch taken, then the 
current PC in the Ibox remains unaffected. 

The alternate PC in both cases (current PC in predicted taken case, and branch PC in predicted 
not taken case) is retained in the BPU until the Ebox retires the conditional branch. When the 
Ebox retires a conditional branch, it indicates the actual direction of the branch. The BPU uses 
the alternate PC to redirect the Ibox in the case of an incorrect prediction. Section 7.5.1.7 has 
more details on mispredicted branches. 

The branch table is written with new history each time a conditional branch is encountered. Once 
a prediction is made, the oldest of the branch history bits is discarded. The remaining 3 branch 
history bits and the new predicted history bit are written back to the table at the same branch PC 
index. When the Ebox retires a branch queue entry for a conditional branch, if there was not a 
mispredict, the new entry is unaffected and the BPU is ready to process a new conditional branch. 
If a mispredict is signaled, the same branch table entry is rewritten, this time the least significant 
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history bit receives the complement of the predicted direction, reflecting the true direction of the 
branch. 

The branch prediction logic is based on the contents of the BPCR register, described in 
Section 7.5.1.8. After power-up, as part of the initialization sequence, the Ebox microcode 
initializes the BPCR to ECC8 (HEX) which implements the truth table in Table 7-30. 

MICROCODE RESTRICTION 

An IPR write to the BPCR register in the BPU is required after power-up to load the 
branch prediction algorithm. 

Table 7-30: Branch Prediction Logic 

Branch 



History 


Prediction for Next Branch 


0 00 0 


Not taken 


0 0 0 1 


Taken 


0 0 10 


Not Taken 


0 0 11 


Taken 


0 10 0 


Not Taken 


0 10 1 


Not Taken 


0 110 


Taken 


0 111 


Taken 


10 0 0 


Not Taken 


100 1 


Taken 


10 10 


Taken 


10 11 


Taken 


110 0 


Taken 


110 1 


Taken 


1110 


Taken 


1111 


Taken 



7.5.1 .4 The Branch Queue 

Each time the BPU makes a prediction on a branch opcode, it sends information about that 
prediction to the Ebox on the I%BRANCH_BUS_H<1:0> The Ebox maintains a queue of branch data 
entries containing information about branches that have been processed by the BPU but not by 
the Ebox. The bus is 2 bits wide: one valid bit and one bit to indicate whether the Ibox took 
the branch or not. Entries are made to the branch queue for both conditional and unconditional 
branches. For unconditional branches, the value of I%BBANCH_BUS_H<0> is ignored by the Ebox. 
The branch queue length is selected such that it does not overflow, even if the entire instruction 
queue is filled with branch instructions, and there are branch instructions currently in the Ebox 
pipeline. At any one time there may be only one conditional branch in the queue. A queue entry 
is not made until a valid displacement has been processed. In the case of a second conditional 
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branch encountered while a first is still outstanding, the entry may not be made until the first 
conditional branch has been retired. 

7.5.1.5 Branch Mispredict 

When the Ebox executes a branch instruction and it makes the final determination on whether 
the branch should or shouldn't be taken, it removes the next element from the branch queue and 
compares the direction taken by the Ibox with the direction that should be taken. If these differ, 
then the Ebox sends E^BRANCH.MISPREDICT.L to the BPU. A mispredict causes the Ibox to stop 
processing, undo (using the RLOG) any GPR modifications made while parsing down the wrong 
path, and restart processing at the correct alternate PC. 

7.5.1.6 Branch Stall 

The BPU back-pressures the IBU by asserting BRANCH_STALL when it encounters a new conditional 
branch with a conditional branch already outstanding. If the BPU has processed a conditional 
branch but the Ebox has not yet executed it, then another conditional branch causes the BPU to 
assert BRANCH.STALL. Unconditional branches that occur with conditional branches outstanding 
do not create a problem because the instruction stream merely requires redirection. The alternate 
PC remains unchanged until resolution of the conditional branch. The Ebox informs the BPU with 
the E%BCOND_R£TTRE_L each time a conditional branch is retired from the branch queue in order 
for the BPU to free up the alternate PC and other conditional branch hardware. 

BRANCH.STALL blocks the Ibox from processing further opcodes. When BRANCH.STALL is 
asserted, the IBU finishes parsing the current conditional branch instruction, including the branch 
displacement and any assists, and then the IBU stalls. The branch queue entry to the Ebox is 
made after the first conditional branch is retired. At this time, BRANCH.STALL is de-asserted and 
the alternate PC for the first conditional branch is replaced with that for the second. 

BSTL.FRC.PCQ is a signal used by the PC queue logic to force an entry into the PC queue when 
the second conditional branch is finally processed by the BPU after the release of a BRANCH.STALL. 
During a BRANCH.STALL, the PC queue refrains from updating the last entry to point to the next 
instruction until the stall breaks and the BPU finishes processing the second conditional branch. 

7.5.1.7 PC Loads 

The BPU distributes all PC loads to the rest of the Ibox. 

Ibox PC loads from the CSU microcode load a new PC in one of two ways. When the CSU asserts 
PC.LD.WBUS, it drives a new PC value on the I%mox.IW_BUS_H<31:0> lines. PC.LD.MD indicates 
that the new PC is on the M9SMDJBUS.H<63:0> lines. The BPU responds by forwarding the 
appropriate value onto the NEW_PC<31:0> lines and asserting LOAD.NEW.PC . These Ibox PC 
loads do not change conditional branch state in the BPU. 

The Ebox signals its intent to load a new PC by asserting E%IBOx_LOAD_PC_L. The assertion 
of this signal indicates that the next piece of IPR data to arrive on the M92MD.BUS_H<63:0> 
is the new PC. The next time the Mbox asserts M%IBOX_IPR_WR_H, the new PC is taken from 
M%MD.BUS_H<31:0> and forwarded onto NEW_PC<31:0> and LOAD.NEW.PC is asserted. 
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The BPU performs unconditional branches by adding the sign extended branch displacement to 
the current PC, driving the new PC onto the NEW_PC<31:0> lines and asserting LOAD_NEW_PC. 
Conditional branches load the PC in the same fashion if the logic predicts a branch taken. The 
following actions occur on a conditional branch mispredict or Ebox PC load: 

• any pending conditional branch is cleared 

• pending unconditional branches are cleared 

• any pending write to the Ebox branch queue is cleared 

• I%FLUSH_IREF_LAT_H is asserted to abort pending Istream fill requests in the Mbox 

7.5.1.8 Branch Prediction IPR Register 

The BPCR IPR provides control for the BPU and read/write access to the history array. The 
write-only BPCR<FLUSH_BHT> bit causes a BPU branch history table flush. The flush is identical 
to the context switch flush, which resets all branch table entries to a neutral value: history bits 
= 0100. The write-only BPCR<FLUSH_CTR> bit causes the BRANCH_TABIJE_COUNTER<8K)> to be 
cleared. The BRANCH_TABLE_COUNTER provides an address into the branch table for IPR read and 
write accesses. Each IPR read from the BPCR or write to the BPCR with BPCR<LOAD_HISTORY> 
= 1 increments the counter. This allows IPR branch table reads and writes to step through 
the branch table array. BPCR<LOAD_HISTORY> enables writes to the branch history table. A 
write to the BPCR<B3STORY> field with BPCR<LOAD_HISTORY> = 1 causes a BPU branch history 
table write. The history bits for the entry indexed by the counter is written with the IPR data. 
BPCR reads supply the history bits in BPCR<HISTORY> for the entry indexed by the counter. 
BPCR<MISPREDICT> will return a "1" if the last conditional branch mispredicted. BPCR<31:16> 
contain the branch prediction algorithm. Any IPR write to the BPCR will update the algorithm. 
An IPR read will return the value of the current algorithm. For example, a "0" in BPCR<16> 
means that the next branch encountered will not be taken if the history is "0000". A "1" in 
BPCR<21> means that the next branch encountered when the prior history is "0101" will be 
taken. 
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31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16115 14 13 12 111 10 9 8 | 7 6 5 4 1 3 2 1 0 

f + — + + + — + + + + H + + — + H + + — + + — + — + + — + + + + + + — + + + + + — + 

| BPU_ALGORITHM | 0 I I I I I 01 history | :BPCR 



I I I I 

LOAD_HISTORY + | | I 

FLUSH_CTR + | | 

FLUSH_BHT + | 

MISPREDICT + 

HISTORY 



The microcode will write the following bit pattern as part of the power-up sequence: 
31 30 29 28|27 26 25 24 1 23 22 21 20|19 18 17 16115 14 13 12|11 10 9 8 | 7 6 5 4 1 3 2 1 0 



All 0's 



Table 7-31: BPCR Field Descriptions 



Name Extent Type Description 



HISTORY 


3:0 


RW 


Branch history table entry history bits. 


MISPREDICT 


5 


RO 


Indicates if last conditional branch mispredicted. 


FLUSH_BHT 


6 


WO 


Write of a 1 resets all history table entries to a neutral value, 
hardware clears bit. 


FLUSH.CTR 


7 


WO 


Write of a 1 resets BPCR address counter to 0, hardware clears bit. 


LOAD_HISTORY 


8 


WO 


Write history array addressed by BPCR address counter. 


BPU_ALGORITHM 31:16 


RW 


Controls direction of branch for given history. 



MACROCODE RESTRICTION 

If an MTPR to the BPCR register is followed by a conditional branch instruction, 
the prediction algorithm used for this branch is unpredictable. Furthermore, the 
branch history table update is also unpredictable. The BPU functions correctly, but 
programs which depend on particular patterns of branch predictions (such as diagnostic 
tests) should avoid placing conditional branch instructions immediately after an MTPR 
instruction that writes to the BPCR register. 

Bits 8,7,6 are defined in Table 7-32 for IPR writes to the BPCR. NOTE: The prediction algorithm 
will be updated on every IPR write to the BPCR. 
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Table 7-32: BPCR <8:6> 



BIT 


BIT 


BIT 


Write Action 


8 


7 


6 




0 


0 


0 


Do nothing, except update algorithm 


0 


0 


1 


Flush branch table. History not written 


0 


1 


0 


Address counter reset to 0. History not written 


0 


1 


1 


Flush branch table, reset address counter, history not written 


1 


0 


0 


Write history to table, counter automatically increments 


1 


0 


1 


Undefined: Branch table flushed, new history written, counter incremented 


1 


1 


0 


Undefined: Write history to old counter value, counter reset to 0 


1 


1 


1 


Undefined: Branch table flushed, write history to old counter value, counter 



reset to 0 



7.6 PC Load Effects 

This section summarizes the various effects of loading a new PC in the Ibox. New PCs are loaded 
from four different sources. The BPU receives the new PCs from all these sources, drives the new 
PC on NEWJ?C<31:0>, and asserts LO AD_NE W_PC . The four sources for new PCs in priority order 
are : 

1. Ebox PC load from the M<aMD_BUS_H<31:0> 

The Ebox loads a new PC as a result of an interrupt or exception or for instructions like 
REI, HALT, CASEx etc. After the Ebox asserts the E%rBOX w LOAD_PCL signal, the PC is 
supplied on the M9iMD_BUS_H<31:0>, along with the M%EBOX_IPR_WR_H signal. The BPU 
selects M%MD_BUS_H<31:0> to drive NEW_PC<31:0> and asserts LOAD_NEW_PC. 

2. Branch Mispredict PC 

When a mispredict has been detected, the BPU drives NEW_PC<31:0> from the alternate PC 
latch containing the address of the branch path not taken, and asserts LOAD_NEW_PC. 

3. PC_LD_WBUS from the CSU 

For instructions like JSB and JMP, the CSU computes a new PC and drives that PC up to the 
BPU. The BPU receives the PC on I%IBOX_IW_BUS_H <31:0>, drives NEW_PC<31:0> and asserts 
LOAD_NEW_PC. 

4. PC_LD_MD from the CSU 

For instructions like JSB, JMP, RET and RSB, the CSU requests a new PC from the Mbox. 
The CSU asserts PC_LD_MD, and the next M%IBOX_DAI/V_L signals the new PC is on the 
M%MD.BUS_H<31:0>. The BPU receives the PC on MTCJMDJMJSJB <31:0>, drives NEW_PC<31:0> 
and asserts LOAD_NEW_PC . 

5. Branch Destination PC 

For unconditional branches or when the BPU predicts a conditional branch as taken, it 
computes the branch destination, drives NEW_PC<31:0>, and asserts LOAD_NEW_PC. 
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The effects of loading a new PC are shown below. These effects take place regardless of the source 
of the PC. 

• PREFETCH.ENABLE is set in the VIC. 

• VTBA<31:3> in the VIC are loaded from NEW_PC<31:3> 

• MHARD_ERR is cleared in the VIC. 

• IMM6T.EXC is cleared in the VIC. 

• MISS_PENDING is cleared in the VIC. 

• WRITE_PENDING is cleared in the VIC. 

• VIC_READ is set in the VIC, allowing a new cache read sequence from the new address. 

• The PFQ is flushed and NEW_PC<2:0> are latched as the initial B YTES_RETTRED . 

• The BPU asserts ITcFLUSH_IEEF_LAT_H indicating that the Mbox should flush its IREF latch. 

• The IBU stops the parser and latches the new PC from NEW_PC<31:0>. 

• The IIU latches the new PC as the next entry in PC queue. 

7.6.1 Mispredict PC Loads 

When a PC load is the result of a branch mispredict, additional actions must be taken as described 
below 

• All pending conditional and unconditional branches are cleared in the BPU. 

• Pending branch queue writes are aborted by the BPU. 

• In the HU, the instruction queue free counter is cleared. 

• In the nu, the PC queue is flushed 

• In the HU, ISSUE.STALL is cleared. 

• The SBU clears the scoreboard array counters. 

• In the CSU, the SI stage produces the mispredict RLOG unwind microaddress. The S3 stage 
is forced to NOP. NOTE: The RLOG is NOT flushed. 

• In the CSU, IMDJVALU) is reset. 

• In the OQU, the MD allocation pointer is reset and the MD allocation counter is cleared. 

• In the OQU, the source queue free counter is cleared. 

• In the OQU, the destination queue free counter is cleared. 

• In the CSU, the LD_PC_MD latch is cleared. 

7.6.2 Ebox PC Loads 

When the Ebox is the source of the new PC, the signal E%EBOX_LOAD_PC_L is asserted several 
cycles before the actual PC arrives from the Mbox. After this signal is asserted, but before the 
new PC is loaded, the signal E%RESTART_IBOX_H may be asserted, starting the parser and VIC 
prefetching. To avoid parsing from the wrong instruction stream, the following actions are taken 
upon the assertion of E%IBOX_LOAD_PC_L. 

• The PFQ is flushed, forcing PFQJEMPTY to be asserted. 

• VIC prefetching is disabled until LOAD_NEW_PC is asserted by the BPU. This also blocks VIC 
bypass to the PFQ. 
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• MHARD_ERR is cleared in the VIC. 

• IMMGTJEXC is cleared in the VIC. 

MICROCODE RESTRICTION 

E%DBOX_LOAD_PC_L and E%IBOX_IPR_WRITE_H must not occur in the same cycle. 
E%EBOX_LOAD_PC_L and E%BE5TART_IBOX_H must not occur in the same cycle. 

7.7 E%STOPJBOX effects 

When the Ebox microcode performs a MISC/RESET_CPU it asserts E%STOP_IBOX_E:. The Ibox 
requires E%STOP_IBOX_H to be asserted whenever RESETJL is asserted. 

MICROCODE RESTRICTION 

E9*tSTOP_EBOX_H must always be followed by E9'cIBOX_LOAD_PC_L and then 
E9*RESTART_IBOX_H. E%STOP_IBOX_H and E%BRANCH.MISPREDICT_L cannot occur in 
the same cycle. 

The effects of this signal on the various sub-secnons in the Ibox are shown below. 

• PREFETCHJENABLE is cleared in the VIC 

• MISS_PENDING, WRITE.PENDING, and READ.STATE are cleared in the VIC. putting the VIC in 
an idle state. 

• HLARDJBRR is cleared in the VIC. 

• MHARD.ERR is cleared in the VIC. 

• EMMGT_EXC is cleared in the VIC. 

• In the ITU, the instruction queue free counter is cleared. 

• In the IIU, ISSUE.STALL is cleared. 

• The IQJVAUD signal, from the IIU to the Ebox, is cleared. 

• The Istream parser in the IBU is stopped. 

• The signals I%IMEM_HERR_H and I%IMEM_MEXC_H are cleared. 

• The PREV_NOT_DONE signal is cleared in the IBU 

• CSU_LD_PC_PEND is cleared in the IBU 

• LD_NEW_PC_PEND is cleared in the IBU 

• The FD opcode flip-flop is cleared in the IBU 

• The IDLE microword is injected into all stages of CSU pipeline. However, NOTE: RLOG 
unwind is not aborted. 

• If an IPR read to back-up PC with RLOG unwind is in progress, the unwind completes as 
normal, but the back-up PC write to the Ebox working register is disabled. All other Ibox IPR 
accesses are aborted. 

• IMDJVALED is reset in the CSU 

• The IREF-pending latch is cleared in the CSU 

• The PC_LD_MD - pending latch is cleared in the CSU 

• The IPR read/write select signals reset in the CSU 

• The stage 1 valid bit is cleared in the CSU 
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* The source queue allocation counter is cleared in the OQU 

* The destination queue allocation counter is cleared in the OQU 

* The MD allocation counter is cleared in the OQU 

* The MD index counter is cleared in the OQU 

* The source and destination scoreboard counters are cleared in the SBU 

* Branch stalls are cleared in the BPU 

* I %FLUSH_IREF_LAT_H is asserted 

7.8 Initialization 

7.8.1 Mechanisms for Ibox State Reset 

The Ibox depends on the E%STOP_rBOX_H signal to initialize the states shown in Section 7.7. 
In addition, RESET.L is used to clear those states listed below which cannot be initialized by 
EFeSTOP.IBOX^H. 

* VTCJENABLE is cleared in the Vic. 

* RLOG pointers are reset in the CSU. 

* The IDLE microword is injected into stage 1 of the CSU pipeline. 

* PC queue pointers are reset in the ITU. 



7.9 Errors, Exceptions, and Faults 

7.9.1 Overview 

The Ibox handles some of the processing for memory hardware errors, memory management 
exceptions, and reserved opcode faults, and reserved addressing mode faults. A global view of 
error, exception, and fault handling is presented here. Implementation details are distributed 
amongst the Ibox sub-section text. 

Istream memory hardware errors may originate in the Mbox and memory subsystem or in the 
Vic array. Dstream memory hardware errors originate in the Mbox and memory subsystem. 
Istream and Dstream memory management exceptions originate in the Mbox. Reserved opcodes 
and reserved addressing modes are detected in Ibox hardware during instruction parsing. 

7.9.2 Istream Memory Errors 

When the Mbox conditions returning Istream data with M%MME_FAULT_H or M%HAED_ERR_H, the 
VIC and PFQ writes are inhibited, prefetching is disabled, and the VIC sets appropriate condition 
nags for the IBU. The IBU continues to parse until it attempts to parse the Istream data that 
caused the exception or error. The condition flags are then forwarded to the Ebox. If the Ebox 
detects an empty instruction queue, source queue, destination queue, or field queue while the 
exception or error condition is asserted, the Ebox initiates an exception microtrap. 
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Any PC load or E%STOP_EBOX_H resets the error and exception flags in the VIC. An Ibox PC 
load or E%RESTART_IBOX b _H restarts prefetching and parsing. Thus if the error or exception 
gets forwarded to the Ebox, the Ebox can reset the Ibox flags, load a new PC and continue. If 
the instruction stream branches around the instruction stream data responsible for the error or 
exception, the Ibox resets the error flags and continues without reporting the condition. 

If a VIC parity error is detected, VIC prefetching and IBU instruction parsing are halted 
immediately and the error forwarded to the Ebox. This action is taken because the data containing 
the error may already have been loaded into the PPQ. If the Ebox detects an empty instruction 
queue, source queue, destination queue, or field queue while the exception or error condition is 
asserted, the Ebox initiates an exception microtrap. Section 7.2.1.7 and Section 7.3.2.15 contain 
the Ibox implementation details of I stream error and exception handling. See Table 8-12 and 
Section 8.5.19 for Ebox implementation details. 

7.9.3 Dstream Memory Errors 

Memory errors on incoming Dstream data are detected during the processing of some deferred 
mode specifiers. In auto-increment deferred and displacement deferred specifier modes, the 
complex specifier unit reads the address of an operand from memory. This memory read is 
followed either by a direct write to an Ebox MD, or an operand memory reference to read the 
actual operand into an Ebox MD and/or create a PA queue entry for a result store. 

If the Mbox returns M%MME_FAULT_H or M^HARD_ERR_H, then in the case of a direct MD write, 
the appropriate flag is sent with the MD write to the Ebox. If the Ebox detects one of the flags 
during an MD file access, it initiates an exception microtrap. If a memory operation is required 
to complete the processing of the specifier, the appropriate error or exception flag, sent with the 
memory request. The Mbox forces a memory management error or exception to occur for that 
reference, causing a fault flag to be returned to the appropriate Ebox MD. 

Section 7.4.2.2.2 contains the Ibox implementation details of Dstream error and exception 
handling. See Table 8-12 and Section 8.5.19 for Ebox implementation details and Section 12.6.5 
for Mbox implementation details. 

7.9.4 Reserved Opcode Faults 

Reserved opcode faults occur when the IBU detects unimplemented or reserved opcodes during 
instruction parsing. All such opcodes stop the parser and make an Ebox instruction queue entry 
containing a microcode dispatch for the reserved opcode routine. Section 7.3.2.12 contains the 
Ibox implementation details for reserved opcode handling. 

7.9.5 Reserved Addressing Mode Faults 

Reserved Addressing Mode Faults occur due to illegal combinations of specifier mode, specifier 
register, and access type. Unpredictable addressing modes occur due to combinations of specifier 
mode, specifier register, access type, and data length that do not make sense. Table 7—33 
summarizes the behavior of the Ibox on reserved and unpredictable addressing modes. Reserved 
addressing modes as specified by the VAX Architecture Standard always cause reserved 
addressing mode faults. Unpredictable addressing modes may produce a fault, or may be allowed 
to continue even though the result does not make sense. The processing of unpredictable modes 
never hangs the machine. 



DIGITAL CONFIDENTIAL 



The Ibox 7-65 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Table 7-33* 


Reserved Addressing Mode Faults 




AUUTcSb 


Access 


GPRs 


Data 


Indexed 


Action 




Tvoe 




Length 






S A #literal 


Modify 








take required fault 


S A #literal 


Write 








take required fault 


S A #literal 


Address 








take required fault 


S A #literal 


Field 








take required fault 


S A #literal 








Yes 


take required fault 


base [Ex] 




PC 






take required fault 


baseCRx] 








Yes 


take required fault 


Rn 


Address 








take required fault 


Rn 








xes 


take required fault 


(Rn)+ 


TUT 

Modify 


PC 






take required fault 


(Rn'H- 


TT* 

write 


PC 






take required fault 


Rn 




PC 






source/dest queue entry has Rn=PC 


Rn 




bP 


q4,g 




2nd source/dest queue entry has Rn=PC 


Rn 




SPAP f FP 


o f h 




unimplemented data lengths 


(Rn) 




PC 






VU61CUJU QU1 1 X ggg 19 UUMl VUlv WUlv 


-(Rn) 




PC 






Operand address is unpredictable 


-(Rn) 








Rx=Rn 


Rx read for index, then Rn read for base 


(Rn)+ 








RxssRn 


Rx read for index, then Rn read for base 


@(Rn)+ 








Rx=Rn 


Rx read for index, then Rn read for base 


(Rn)+ 


Address 


PC 






PC after specifier byte passed as address 


(Rn)+ 




PC 




Yes 


Rx for index is read but not used 



When a Reserved Addressing Mode Fault is detected, I%RSVD_ADDR_FAULT_H is asserted, VIC 
prefetching is stopped, the IBU is stopped, and the CSV goes idle. A Reserved Addressing Mode 
Fault also blocks the OQU from making the source queue or destination queue entry associated 
with the faulting operand. 

If the Ebox detects an empty source queue, destination queue, or field queue while 
I%RSVD_ADDR_FAULT_H is asserted, the Ebox initiates an exception microtrap. 

All reserved addressing mode fault conditions are cleared in the Ibox when the Ebox loads a new 
PC. 
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7.10 Ibox Signal Name Cross-Reference 

All signal names referenced in this chapter have appeared in bold and reflect the actual name 
appearing in the NVAX schematic set. For each signal appearing in this chapter, the table below 
lists the corresponding name which exists in the behavioral model. 



Table 7-34: Cross-reference of all names appearing In the Ibox chapter 


Schematic Name 


Behavioral Model Name 


I«5,BRANCH W BUS_H<1:0> 


I%BRANCHJBUS_H<1:0> 


I^FORCE_HAED_FAULT_H 


I%FORCE_HARD_FAULT_H 


I*FORCE_MME_FACLT_H 


I%FORCE_MME_FAULT_H 


I^IBOX^IA _ADDR_H<3 :0> 


I%IBOX_IA_ADDR_H<3:0> 


I*IBOJLIA_READ_H 


I%IBOX_IA_READ_H 


KIBOS W IW_ADDB_H<4:0> 


I%IBOX_IW_ADDR_H<4:0> 


kibox_iw_bus_h<31:0> 


I%IBO3LIW_BUS_H<31:0> 


I%IBOSLIW_WRITE_H 


I%IBOX_IW_WRITE_H 


I%IBOX.S_EHB_L 


I%IBOX_S_ERR_L 




I%IMEM_HERR_H 


I*IMEM_MEXC_H 


I%IMEM_MEXC_H 


I%IQJBUS_H<22:0> 


I%IQ_BUS_H<22:0> 


I%OPERAND_BUS_E<14:0> 


I%OPERAND_BUS_H<14K)> 


I^PMUXOJB 


I%PMUX0_H 


I<fcPMUXl_H 


I%PMUX1_H 


l5fcBSVD_ADDR_FAULT_H 


I%RSVD_ADDR_FAULT_H 


ESfcBCOND RETIRE L 


E%BCOND_RETIRE_H 


E%BHANCH_MISPRKDICT - L 


E%BRANCH_MISPREDICT_H 


£%DQ_RE < nRE_H 


E%DQ_RETIRE_H 


E%DQ_KKTTHK_RMODK_H 


E%DQ_RETIREJRMODE_H 


E%DQ_HKTIRE_RN_H<3 K)> 


E%DQ_RETIRE_RN_H<3K>> 


E%FLUSH_BPTJH 


E%FLUSH_BPT_H 


E%FLUSH_PCQ_H 


E%FLUSH_PCQ_H 


E%FLtJSH_VIC_H 


E%FLUSH_VIC_H 


E%FPD_SET_L 


E%FPD_SET_H 


e%iboxja_bus_h:<31:0> 


E%IBO3LIA w BUS_H<31:0> 


s«moxjpR_NUM.H<3:0> 


E%IBOX w IPR_NUM_H<3 :0> 


E<&IBOXJFB_READ_H 


E%IBOXJPR_READH 


s%raoxjPB_TM;ja<2K)> 


E%IBOX_IPR_TAG._H<2:0> 


JSKEBOXJPRJWS30XJ3. 


E%IB02LIPR_WRITE_H 
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Table 7-34 (Cont.): Cross-reference of all names appearing in the Ibox chapter 

Schematic Name Behavioral Model Name 



E%IBOX_IX)AD_PC_X. 
F^RESTXRTJBOX^H 
E%RETIKE_INSTIl_L 

e%sql.RET3BE_h<1:0> 
e%sq_retike_md_h< 1 :0> 
e%sq_retire_rmode_h<1:0> 
e%sq_re1xre_rn1_h<3 :0> 

EStSQ_RFnRE_RN2_H<3 :0> 

e%stop_ibox_h 
i*flush_ir£f_lat_h 
i%forc£_eabd_fault_h 
i^fokce_mme_fauxt_h 
i<*ibo5 w adi>r_h<3 1:0> 

I5rIBOX.AT_L<l:0> 

irrIBOS_CMD_L<4.1:0> 

I%1BOS W DI^L<1:0> 

I%IBO5.KEr_BEST - L<l:0> 

I%lBO3LTAG_L<2:0> 

I%SPEC_RKQ_H 

M%HAKD_ERR_H 

M%IBOX_DAlA_L 

M%IBOX_IPR_WR_H 

M%IAST_FILI<J3 

M%MD_BUS_H<63 :0> 

M%MD_BUS_QW_PARITT_L 

M%MME_FAULX_H 

M%QW_AIJC3tNMKNT_H< 1 :0> 

M%SPKC_Q_FDIJh_H 

M%VICJDATA_L 



E%IBOX_LOAD_PC_H 

E%RESTART_IBOX_H 

E%RETIRE_INSTR_H 

E%SQ_RETIRE_H<1:0> 

E%SQ_RETIRE_MD_H<1:0> 

E%SQ_RETIRE_RMODE_H<1:0> 

E%SQ_REHEE_RN1_H<3:0> 

E%SQ_RETIRE_RN2_H<3:0> 

E%STOP_IBOX_H 

I%FLUSH_IREF_LAT._H 

I%FORCEJKARDJ?AULT_H 

I%FORCE_MME_PAULT_H 

I%IBOX_ADDR_.H<31:0> 

I%IBOX_AT_H<1:0> 

I%IBOX_CMDJI<4:0> 

I%IBOX_DL_H<1:0> 

I%IBOX_REF_DEST_H<1:0> 

I%IBOX_TAG_H<2:0> 

I%IREF_REQ_H 

I%SPEC_REQ_H 

M%HARD_ERR_H 

M%IBOX__DATA_H 

M%IBOX_IPR_WR_H 

M%LAST_FILL_H 

M%MD_BUS_H<63K)> 

M%MD_BUS_QW_PARITY_H 

M%MME_FAULT_H 

M%QW_ALIGNMENT_H<1:0> 

M%SPEC_Q_FULL_H 

M%VIC_DATA_H 



7.11 Testability 
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7.11.1 Overview 

Ibox testability is enhanced by architecturally accessible features, and connections to the internal 
scan register and the parallel port. 

7.11.2 Internal Scan Register and Data Reducer 

Ibox state can be latched into the scan register and shifted off-chip through the global internal 
scan register. The shift out begins with scan register bit 0. See Chapter 19 for the implementation 
details of the internal scan register. Table 7—35 lists the states in the Ibox scan register. Under 
global control from the test port, the Ibox scan register can be configured as a LFSR. 



Table 7-35: Ibox Scan Register Fields 



Bit 
Field 


Field Name 


Description 


<0> 


STP_RESTART 


Stop parser flag 


<1> 


STP_SUPPRESS 


Stop parser flag 


<2> 


SHUT 


specifier control <0>, short literal 


<8:3> 


RN/SHORT LITERAL 


specifier control <6:1>, register or shlit value 


<11:9> 


AT 


specifier control <9:7>, access type 


<13:12> 


DL 


specifier control <11:10>, data length 


<14> 


VALID 


specifier control <12>, valid 


<15> 


COMPLEX 


specifier control <13>, complex specifier 


<18:16> 


DISPATCH 


specifier control <16:14>, dispatch address 


<19> 


ATJRMW 


specifier control <17>, RMW 


<20> 


INDEXED 


specifier control <18>, index 


<21> 


ASSIST 


specifier control <19>, assist 


<22> 


PC.MODE 


specifier control <20>, PC mode 


<23> 


JMP_OR_JSB 


specifier control <21>, JMP or JSB 


<25:24> 


EJ>L 


execution data length <1:0> 



7.11.3 Parallel Port 

The CSU microcode address is routed to the chip parallel port. The microcode address can be 
monitored on a cycle by cycle basis during chip debug by selecting the Ibox as source to the 
parallel port. When selected, a buffered version of the control store address, MUSLH<6:0>, appears 
on PP_DATA<6:0>. See Chapter 19 for the implementation details of the parallel port. 
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7.11.4 Architectural Features 

Internal processor registers are included as architectural features to aid in testability. IPR access 
to VIC tags and data is available through the VTAG and VDATA registers. See Section 7J2.1.16 
for the implementation details of the these registers. IPR access to the branch history table and 
branch status is available through the BPCR register. See Section 7.5.1.8 for the implementation 
details of the BPCR. 



7.12 Performance Monitoring Hardware 
7.12.1 Signals 

The Ibox provides two signals for performance monitoring: K»PMUX0_H asserts on every VIC access 
and I%PMUXl_H asserts on every VIC hit. These signals enable the Ebox performance monitoring 
hardware to gather statistics on VIC hits versus VIC accesses. 
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7.13 Revision History 



Table 7-36: Revision History 



Who 


When 


Description of change 


John R Brown 


19-Feb-1991 


Update following pass 1 tape out 


John F. Brown, 
Ruben 
Castelino, 
Mary Field, 
Paul Gronowski, 
Jeanne Meyer 


12-Jan-1990 


Intermediate release. 


John F. Brown, 
Paul Gronowski, 
Jeanne 
McEinley 


06-Mar-1989 


Release for external review. 


John F. Brown 


19-Dec-1988 


Partial Update. 


Shawn Persels 


06-Oct-1988 


Initial release. 
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Chapter 8 
The Ebox 



8.1 Chapter Overview 

This chapter describes the Ebox section of the NVAX CPU chip. Only the major functional blocks, 
their interfaces to each other, and the interface to the rest of the NVAX system are described here. 
Circuit level implementation details are not of primary concern in this document. 

8.2 Introduction 

The Ebox is the instruction execution unit in the NVAX CPU chip. It is a 3 stage pipeline (S3..S5) 
which runs semi-autonomously to the rest of the NVAX chip and supports the following functions: 

• Instruction Execution 

The Ebox is responsible for carrying out the execution portion of each VAX instruction under 
control of a microflow whose initial address is provided by the Ibox issue unit. 

• Instruction Coordination 

The Ebox is a major source of control to coordinate instruction processing in the Ibox, Mbox, 
and Fbox. It ensures that Ebox and Fbox macroinstructions retire in the proper order, and 
it provides controls to the Mbox and Ibox which help manage certain inter-macroinstruction 
dependencies. The Ebox cooperates with the Ibox in handling mispredicted branches. 

• Trap, Fault and Exception Handling 

The Ebox coordinates trap, fault, and interrupt handling. It delays the condition until all pre- 
ceding macroinstructions complete properly. It then collects information about the condition 
and ensures that the correct architectural state is reached. 

• CPU Control 

Most CPU control is provided by the Ebox. Ebox control functions include CPU initialization, 
controlling Ibox, Fbox, and Mbox activities, and setting control bits during major CPU state 
changes (e.g. taking an interrupt or executing a change mode instruction). 

The Ebox accomplishes many of the above functions by executing the NVAX Ebox microcode. This 
chapter views the Ebox as the interpreter of microcode. Describing how microcode functions are 
used to correctly emulate the VAX architecture or the architectural motivation for Ebox hardware 
functions is generally outside the scope of this discussion. 
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Figure 8—1 at the end of this section is a top level block diagram of the Ebox showing all the 
major Ebox function units, their interconnections, and their place in the pipeline. The pipeline 
segments are shown in the diagram (S2, S3, S4, and S5). The sections following the diagram 
describe the function elements depicted and the Ebox pipeline. 



8-2 The Ebox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 8-1 : Ebox Block Diagram 




DIGITAL CONFIDENTIAL 



The Ebox 8-3 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



8.3 Chapter Structure 

The Ebox is described from both an overall functional and individual function unit standpoint. 
The top level description is of the major Ebox functions. The next level consists of a detailed 
description of each of the Ebox function units. 

The Ebox functions are described in the initial sections of this chapter. They are presented 
referring to the microcode fields which control the Ebox. Within each section the Ebox functions 
in question are discussed in detail and the Ebox function units which support that function are 
introduced. The functional overview is followed by a comprehensive description of the each of the 
Ebox function units. 

The latter sections of this document describe Ebox initialization, timing, error handling, testabil- 
ity and other details not related to the main-line functionality of the Ebox. 

8.4 Ebox Overview 
8.4.1 Microword Fields 

The Ebox is controlled by the data path control portion of the microword, which is either standard 
or special format. The other portion of the control word, the microsequencer control portion, 
controls the microsequencer which determines which microword is fetched in every cycle. The 
fields of the data path control portion of the microword and their effect within the Ebox are shown 
in Table 8—1. For more information on microword formats and field widths see Chapter 6. 



The notation FIELD/FUNCTION is used throughout this chapter to mean that microword 
field FIELD specifies FUNCTION. 

Table 8-1 : Data Path Control Microword Fields 



NOTATION 



Microword 
Field 



Microword 
Format 



Description 



FORMAT 



Both 



This one-bit field determines whether the microword is in the special format. 
If it is 1, the MISCl, MISC2, and D fields exist If it is 0, the Q, SHF, and 
VAL fields exist instead. 



LIT 



Both 



This one-bit field determines whether the microword is the constant generation 
variant (format). If it is 1, the POS and CONST fields exist. If it is 0, the VAL 
and B fields exist instead in standard format, and the ME5C2, D, and B fields 
exist instead in special format 



ALU 



Both 



Sets the ALU function, including typical ALU operations, and others. 



MRQ 



Both 



Controls initiation of Ebox memory accesses and other Mbox control functions. 
The Ebox decodes the field and sends the corresponding request to the Mbox. 



SHF 



Standard 



Sets the shifter function. The W and Q fields control how the shifter output 
is used. Some settings of this field specify a pass operation instwari of a shift 
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Table 8-1 (Cont): Data Path Control Microword Fields 


Microword 
Field 


Microword 
Format 


Description 


VAL 


Standard 1 


Specifies the shift amount (1 to 31) or, if VAL = 0, specifies to shift the amount 
in the SC register. 


A 


Both 


Specifies the source of E_BU8%abus_l<81 iO> for this microword. The A field 
can select any element in the register file or one of several of Ebox sources. 
e_bus%abus_Ix31iO> is one of the two sources for the ALU and the shifter. 


B 


Both 1 


When the source of E_BUB%BBUS_L<3i«o> is a register this field specifies the 
source of E_BU8%BBUS_L<3i «o> . The B field can select from some of the elements 
in the register file or from a small number of other Ebox sources. E _bus%bbus_ 
L<siiO> is one of the two sources for the ALU and the shifter. 


POS 


Both 2 


When the source of e_bus%bbus_l<siiO> is from the constant generator this 
field specifies which byte the constant value is in. Bytes 0 through 3 may be 
specified. The other bytes are forced to 0. 


CONST 


Both 2 


This field contains the literal byte value which is sourced to one of the bytes 
of E_BUSW3BUS_L<3iiO> as specified by the POS field. (The other E_BUS%bbus_ 
L^iiO> bytes are forced to 0.) 


CONST.10 3 


Both 2 


This field contains the literal 10-bit value which is sourced to E_BUS t £BBUS_ 
L*&o>. ( e_bus*bbus_l<31:10> are forced to 0.) 


DST 


Both 


This field specifies the destination of E_BUS«£WBUS_L<3iiO>. The possible des- 
tinations include a subset of the register file and a number of other Ebox 
destinations. 


Q 


Standard 


Controls whether or not the Q register is loaded with the shifter output for 
this microword. 


W 


Both 


Selects the driver of e_bub«»wbus_l<3i»o>. Either the ALU or the shifter output 
is driven on e_bu8'£WBTJS_l<31j0>. 


L 


Both 


This field controls whether the Ebox operations are done with a data length of 
longword or the length specified in the DL register. The Ebox operations af- 
fected are condition code calculation, size of memory operations, zero extending 


V 


Both 


Controls updating of the VA register. Either the VA register is updated with 
the value from the ALU, or it is not changed from its previous value. 


MISC 


Both 


This field has many uses. Only one use can be selected at a time. This field 
can control PSL condition code alterations, set the DL register, set or clear state 
flags, or invoke a box coordination or control function. 


MISC1 


Special 


This field can specify one of a few Ibox or Fbox coordination or control func- 
tions, and can be used to set or clear state flags. 


MISC2 


Special 1 


One Mbox control function and one to add an Fbox destination scoreboard 
entry. 


DISABLE JRETTRE 


Special 1 


This field is used to disable retire of macroinstructions and retire queue entries 



1 Not constant generation microword variant. 



2 Constant generation microword variant. 

3 The CONST.10 field is actually the POS field bitwise concatenated with the CONST field, with the POS field in the 
more significant position. It is simply a way of treating these two microword fields as one. CONST.10 is only used when 
MISC/CONST. 10JBIT is specified. 
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When a microword field is not present in all formats, it defaults to NOP (no operation) when a 
microword format without that field occurs. More specifically, standard format microwords effec- 
tively specify MISC1/NOP, MISC2/NOP, and DISABLE JRETTRE/NO by default. Special format microwords 
effectively specify Q/HOLD.Q, SHF/NOP, and VAL/o. When the microword is the constant generation 
variant of the standard format microword, VAL/0 is effectively specified, and the B field is ignored 
since this microword variant sources a constant onto E_BUS%BBUS_L<3 ik>> . In the constant gen- 
eration variant of the special format microword, MISC2/NOP and DISABLE JIETXRE/NO are effectively 
specified, and the B field is ignored because this microword variant also sources a constant onto 
E_BUS%BBUS_L<31:0>. 

8.4.1.1 Microsequencer Control Fields 

In addition to decoding the datapath control portion of the microword, the Ebox decodes a part 
of the Microsequencer control portion of the microword. Specifically, it detects when the SEQJMT 
and SEQ.MUX fields (see Chapter 9 and Chapter 6) specify LAST.CYCLE or LAST.CYCLE.OVERPLOW. 
The Ebox fault detection logic and the RMUX control logic use these decodes. 

8.4.2 The Register Fife 

The register file contains four kinds of registers: MD (memory data), GPP*, Wn (working), and 
CPUSTATE registers. The MD registers receive data from memory reads initiated by the Ibox, 
and from direct writes from the Ibox. The Wn registers hold microcode temporary data. They 
can receive data from memory reads initiated by the Ebox and receive result data from ALU, 
shifter, or Fbox operations, and from the Ibox in the case of Ibox IPR reads. The GPRs are the VAX 
architecture general-purpose registers (though R15 is not in the file) and can receive data from 
Ebox initiated memory reads, from the ALU or shifter, or from the Ibox. The CPUSTATE registers 
hold semipermanent architectural state (e.g. KSP, SCBB). They can only be written by the Ebox. 

8.4.3 ALU and Shifter 

Each microword specifies source operands for the ALU or shifter (A, B, POS, and CONST fields), 
operations for these function units to perform (ALU, SHF, and VAL fields), and a destination (or 
possibly two destinations if Q or VA is updated) for the results) (DST, Q, W, and V fields). Note 
that in special format microwords no shifter operation can be specified and the Q register can't be 
altered. In the course of executing the microword, the Ebox will fetch the source operands onto 
E_BUS%ABUS_L<31:0> and E_BUS%BBUS_L<31:0>, carry out the specified ALU and shifter functions, 
and store the result in the specified locations (if any). 

8.4.3.1 Sources of ALU and Shifter Operands 

In general the sources of E_BUS%ABUS_L<3 1jO> and E_BUS%BBUS_L<si :0> (the inputs to the ALU 
and shifter) are either a constant, a register from the register file, an Ebox register (e.g. PSL, Q, 
or VA), an Ebox source value calculated by a special function unit, a hardware status provided via 
a special path from outside the Ebox (e.g., interrupt status), or an entry from the source queue. 
E_BUS<&BBUS_L<31:0> sources are limited to a subset of the register file, certain Ebox registers, 
and an entry from the source queue. The source queue is introduced in Section 8.4.4. 
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8.4.3.2 ALU Functions 

The ALU is capable of standard operations on byte, word, and longword size operands. It can 
pass either input to the output and is capable of a number of arithmetic and logical operations 
on one or two operands, producing condition codes based on data length and operation. It also 
has specialized functions which are discussed in Section 8.5.3. 

8.4.3.3 Shifter Functions 

The shifter does longword and quadword shift operations and certain pass-thru operations, always 
producing a longword output. The shifter treats the two sources as a single quadword, with 
E_BUS%ABUS_L<3 1 K)> as the more significant longword. The longword output is this quadword 
shifted right 0 to 32 bits and truncated to longword length. The shifter produces condition codes 
based the longword output data. 

8.4.3.4 Destinations of ALU and Shifter Results 

The output of the shifter and the output of the ALU can drive E_BUS#WBUSJL<31:0>. The shifter 
output is also directly connected to the Q register so that the Q register can be loaded with the 
shifter output regardless of the source of E_BUS% WBUS_L<3 1 K>> . In the same way, the ALU output 
is directly connected to the VA register. E_BUS9cWBUS_L<3i:0> data is the input to one of the write 
ports on the register file and can be used to update any register file entry except an MD register. 
Certain other Ebox registers (e.g. SC, PSL) can be loaded from EJBUS9eWBUS_L<3i:0>. 

The destination of E_BUS%WBUS_L<3lst» can be specified by the current destination queue entry, 
when the microword so specifies. The destination queue is introduced in the following section. 

8.4.4 Ibox-Ebox interface 

The Ibox-Ebox interface is made up of a number of FIFO queues. The purpose of these queues is to 
allow the Ibox to fetch and decode new instructions before the Ebox is ready to execute them. The 
Ibox adds entries as it decodes instructions, and the Ebox removes them from the other end as it 
executes them. For each opcode, there is a predetermined number of entries added to the various 
queues by the Ibox. Ebox execution microfiows remove exactly the right number of entries from 
each queue. 

The queues which interface the Ibox to the Ebox directly are the source queue, the destination 
queue, the branch queue, and the field queue. The instruction queue, the PA queue, and the 
retire queue are introduced here for completeness. 

The source queue holds source operand information. Entries are added by the Ibox as it decodes 
the source type operand specifiers of each instruction. The entry is either a pointer into the 
register file or the data from a literal mode operand specifier. The Ebox accesses and removes 
an entry each time a microword specifies a source queue access in either the A or B fields. If the 
entry is literal data, it is used as an ALU and/or a shifter operand. Otherwise the register file is 
accessed using the pointer in the entry. 

The destination queue holds result destination information. Entries are added by the Ibox as it 
decodes the destination type operand specifiers of each instruction. A destination queue entry 
is either a pointer to a GPR in the register file or a flag indicating that the result destination is 
memory. The Ebox accesses and removes an entry each time a microword specifies a destination 
queue access in the DST field or the Fbox supplies a result which specifies a destination queue 
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access. If the entry is a pointer to a GPR, the Ebox writes the ALU, shifter, or Fbox data into the 
register file. Otherwise the data is stored in memory at the address found in the PA queue. 

The PA queue is in the Mbox. Each time the Ibox adds an entry indicating a memory destination 
to the destination queue it also sends the Mbox a virtual address to be translated. When the 
Mbox has translated the address it puts it in the PA queue. If the current destination queue 
entry indicates a memory destination, the Ebox sends the result data to the Mbox to be written 
to the physical address found in the PA queue. The Mbox removes the PA queue entry as it uses 
it. 

The branch queue holds status bits for each branch instruction processed by the Ibox. The Ibox 
adds an entry to the branch queue each time it finishes processing a conditional or unconditional 
branch. The Ebox references and removes the current branch queue entry in the execution 
microflow for the branch This allows the Ebox to synchronize with the Ibox so that the branch 
does not finish executing until the Ibox has successfully fetched the branch displacement specifier. 
It also allows the Ebox to check for an incorrect branch prediction by the Ibox. 

Each time the Ibox decodes a branch it calculates the branch address. For unconditional branches 
it simply begins fetching from the new instruction stream immediately For conditional branches 
the Ibox predicts whether the branch will be taken or not. The branch queue entry added by 
the Ibox indicates the branch prediction. When the Ebox executes an unconditional branch, it 
references the branch queue simply to ensure that the Ibox was able to fetch the displacement 
specifier without a fault or error. For conditional branches the Ebox also checks that the branch 
prediction was correct and initiates a microtrap if it wasn't. If the branch wasn't correct, the 
Ebox notifies the Ibox, which uses the alternate path PC (which it had kept) to begin fetching 
along the correct path. 

The retire queue holds status for each macroinstruction currently being executed in the Ebox 
or the Fbox. The status indicates which unit will execute the instruction, the Ebox or the Fbox. 
The Ebox adds an entry each time the Microsequencer dispatches to a macroinstruction execution 
microflow. The Ebox references the retire queue when the macroinstruction execution is complete 
in order to ensure that instructions finish executing in the proper order. A certain amount of 
concurrent execution in the Fbox and Ebox is possible. The retire queue is used to prevent one 
box from altering any architecturally visible state before the other box's execution for preceding 
macroinstructions finishes. The Ebox references and removes a retire queue entry each time an 
Fbox or Ebox instruction is retired. 

The field queue holds a one-bit type status for variable-length bit field base address operands 
processed in the Ibox. (Note that some operands are treated as variable-length bit field base 
address operands internally by the NVAX CPU even though the operand is not really the base 
address of a variable-length bit field. These operands, including the true bit field base address 
operands, are collectively referred to as field operands.) The field queue entry indicates whether 
the field operand was register mode. The Ibox adds an entry when it processes operands which 
it knows by context require an entry. The Ebox retires an entry after it has used the information 
in a microcode conditional branch. Very different execution microfiows are required for some 
instructions, particularly bit field instructions, depending on whether a particular operand is 
register mode or specifies a memory address. In the latter case the information sent by the Ibox 
is a memory address, while in the first case the source and destination queue entries point to the 
register in the register file. See Section 8.5.15.8 for more information. 



8-8 The Ebox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



The instruction queue is part of the Ibox-Microsequencer interface. It holds information derived 
from the VAX instruction opcode. The Ibox adds an entry as it decodes each instruction. An 
entry contains the opcode, data length, the microcode dispatch address for execution, and a flag 
indicating whether the macroinstruction is for the Fbox, The Microsequencer references and 
removes an entry at the start of execution of each VAX instruction. It uses the dispatch address to 
fetch the first microword of the macroinstruction execution microflow. At the same time it passes 
the opcode, data length, and the Fbox execution nag to the Ebox. The Ebox adds an entry to 
the retire queue at that time. That entry is simply the Fbox execution nag (except if the Fbox is 
disabled, see Section 8.5.15.7). See Section 9.2.3.3.4 for more on the instruction queue. 

8.4.5 Other Registers and States 

The Ebox contains several special purpose registers, the SC, VA, and Q registers, and the PSL. 
The SC register holds a shift count for use in some shift operations. 

The VA register can hold a virtual address or a microcode temporary value. The VA register is 
directly readable by the Mbox and is the address source for all Ebox initiated memory operations. 
The VA register is loaded directly from the ALU output. 

The PSL is the VAX architecture program status longword register. It is loaded from E_BU5^WBUS_ 
L<31:0> and can be used as a source operand by the ALU or shifter. Its bits are used in many places 
in the Ebox and elsewhere in the CPU where required by the VAX architecture. 

The Q register is loaded from the output of the shifter. It holds shifter results for later use. 

8.4.6 Ebox Memory Access 

Through the mechanism of the source queue and the destination queue, the Ibox initiates most 
memory accesses for the Ebox. In certain cases the Ebox must carry out memory accesses on 
its own. The MRQ field of the microword specifies the Mbox operation. The virtual or physical 
address is provided from the VA register. If the VA is being updated in this microword, the address 
is bypassed directly from the output of the ALU. For writes, the data is taken from E_BUS%WBUS_ 
L<31:0>, so it can be the output of the shifter or the ALU. For reads, the DST field of the microword 
specifies the register file entry which is to receive the data. This register must be a GPR or a 
working register. 

8.4.7 CPU Control Functions 

Most control functions are invoked through one of the MISC fields, but some of the MRQ field 
functions are Mbox control functions or miscellaneous control functions rather than memory 
access commands. The control functions generally act to reset a function unit (Fbox, Ibox, or 
Mbox), synchronize Ebox operation with a function unit, or restart semiautonomous operation of 
the Mbox or Ibox when either of them has stopped for some reason. 
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8.4.8 Ebox Pipeline 

Execution of microwords in the Ebox is pipelined with three pipe stages (S3..S5). These stages are 
shown in Figure 8-1. In the first stage (S3), the E_BUS%ABUS_L<31K>> and E„BUS%BBUS_L<31 :0> 
sources are fetched or prepared. In the second (S4) the ALU and shifter operate on the data. In 
the third (S5) the result is written into the register file or to some other destination. Stages 

53 and S4 can stall for various reasons. Stage S5 cannot stall. Once a particular microword's 
execution has advanced into S5, it is going to complete. Various stalls occur in S4 in order to 
ensure that a particular microword's effects do not change any architectually visible state (e.g., 
GPRs, PSL) before proper completion without memory management faults is guaranteed. 

The Microsequencer fetches the microword and delivers it to the Ebox in S3. If the Ebox's S3 
stage is stalled, the Microsequencer 's S2 activity is stalled as well. See Chapter 9 for more detail. 

Even though the operand fetch, function execution, and result store take place in different cycles, 
the microword specifies the operation as if it all took place in one cycle. The Ebox has bypass 
paths which allow a microword to use a register as a source even it it is updated by one of the two 
preceding microwords. For example, if the immediately preceding microword updates Wi in the 
register file and the current microword specifies Wi as a source to the ALU, the Ebox hardware 
detects the condition and muxes the data into the staging latch before the ALU at the same time 
as it forwards the data to the latch which sources EJBUS%WBUS_L<31:0> in stage S5. 

Bypass paths are only implemented where performance considerations warrant. Also bypass- 
ing isn't the solution to every problem pipelining introduces. For example, after the PSL is 
updated the microcode allows 2 cycles before a microword specifying SEQ.MUX/LAST. CYCLE or 
SEQJtfUX/LAST.CYCLE.OVERFLOW because the PSL is not actually updated until So. The 
Microsequencer uses the FPD, T, and TP bits in the PSL to determine the proper new microflow 
dispatch. It would make the decision based on old PSL information if the microcode didn't allow 
the 2 cycles. 

One place where the effect of pipelining is particularly apparent is in microcode conditional 
branches. For example, a microcode branch based on E_BUS%BBUS_L<3 1 :0> data must immediately 
follow the microword which sources the relevant data onto E_BUS%BBUS_L<3iK>>. Similarly, a 
microcode branch based on the ALU condition codes must be the second microword after the one 
which specified the ALU operation. See Chapter 9 for more detail on microcode branches. 

8.4.9 Pipeline Stalls 

The Ebox pipeline is controlled by the stall and fault logic. This function unit supplies stall 
signals which are used to gate clocking of control and data latches in each stage. It also controls 
insertion of effective no-ops into S4 when S3 is stalled and into S5 when S4 is stalled. 

The Ebox pipeline stalls in S3 when it is accessing a source operand in the register file or the 
source queue which is not valid. Many register file entries have a valid bit associated with them. 
A register file entry is not valid, and its valid bit is not set, if a memory read has been initiated 
for that entry and hasn't yet completed. A source queue entry is not valid if the Ibox hasn't added 
that entry yet. 

The Ebox stalls in S4 if the current destination queue entry is not valid and the microword in 

54 references a destination queue entry. A destination queue entry is not valid if the Ibox hasn't 
added that entry yet. 
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The Ebox stalls in S4 if the current destination queue entry is valid but specifies a memory 
destination for the data and the current PA queue entry is not valid. A PA queue entry is not 
valid if the Mbox hasn't added that entry yet. 

The Ebox stalls in S4 if the microword in S4 requests a memory operation and the Mbox is 
already working on an Ebox initiated memory operation (that is, the previous request is still in 
the EM_LATCH). 

The Ebox stalls in S4 if the microword in S4 synchronizes with the branch queue and the branch 
queue entry is not valid. A branch queue entry is not valid if the Ibox hasn't added that entry 
yet. 

The Ebox stalls in S4 if the current retire queue entry specifies that an Fbox instruction must 
retire before the instruction associated with the microword in S4 and the Ebox is requesting the 
use of the RMUX to store result data. (The Ebox requests the use of the RMUX if the microword in 
S4 specifies anything other than NONE in the DST field.) 

If the Ebox stalls in S3, the S4 and S5 stages of the pipeline can continue execution. If S4 doesn't 
stall when S3 does, then an effective no-op is inserted into S4 after the current S4 operation 
advances into S5. The no-op is necessary so that the stalled S3 microword isn't advanced to S4 
and S5 while an S3 stall is in effect. See Section 8.5.20 for more detail. 

If the Ebox stalls in S4 then S3 stalls as well. (Microwords can't pass each other in the pipeline.) 
During S4 stalls, an effective no-op is inserted into S5 after the operation in S5 completes. This 
is necessary so that the operation in S4 isn't advanced into S5 while an S4 stall is in effect. See 
Section 8.5.20 for more detail. 

In any cycle that the Ibox has not made a microstore dispatch address available to the 
Microsequencer and a dispatch is needed (i.e., during the last cycle of any microflow), the mi- 
crosequencer fetches the STALL microword. This microword specifies no Ebox operation and can't 
cause a stall anywhere in the pipeline (although it does specify SEQ.MUX/LAST.CYCLE). This allows 
the microwords already in the pipeline to continue even when the Ibox is temporarily unable to 
supply new instruction execution dispatches. See Chapter 9 for more detail. 

A microcode loop which repeatedly accesses the field queue until the current field queue entry 
becomes valid is also very much like a stall, though the stall logic is not actually involved. This 
condition is referred to as a field queue stall. In this situation, the Ebox pipeline advances in 
each cycle (unless the microword in S4 is stalled also). However, the same microword is fetched 
out of the control store in every cycle. In typical microcode usage of the field queue conditional 
branch, this microword will not alter any state in S4 or S5. See Section 8.5.15.8 for more detail. 

8.4.10 Microtraps, Exceptions, and Interrupts 

The Ebox and Microsequencer together coordinate the handling of exceptions and interrupts. 
Most interrupts and some exceptions are handled by Microsequencer dispatching to a microcode 
exception handler routine at the end of the current VAX instruction. These dispatches do not affect 
the execution of microwords already in the pipeline. Other exceptions cause a microtrap. In a 
microtrag the Microsequencer signals the Ebox to cause stages S3, S4, and S5 of the Ebox control 
pipeline to be flushed. It also signals the Ebox to flush the retire queue. (Mushing of the other 
Ibox-to-Ebox queues, the Fbox pipeline, and the specifier queue in the Mbox is done by microcode, 
except in the case of a branch misprediction.) At the same time the Microsequencer fetches a new 
microword from a special dispatch address in the control store based on the particular microtrap 
condition. This microflow handles any other necessary state flushing. Because a microtrap affects 
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microwords already in the pipeline, the Ebox delays handling most traps until the microword 
which incurred the fault has reached S4. The microtrap is taken at the time that microword 
would normally have entered S5. In certain cases, Ebox stalls delay a microtrap until the stall 
is ended. The purpose of this is to ensure that operations which are part of a preceding VAX 
instruction are allowed to complete properly. 

Most of the microtraps which the Ebox delays until S4 are due to Ibox-initiated memory operations 
which had an access or translation fault. Faults due to Ibox-initiated reads are detected by the 
Ebox when it accesses a valid MD register from the register file, and the fault bit associated with 
that MD is set. Each MD register has a fault bit which is set by the Ibox or the Mbox when a fault 
occurs in the memory reads necessary to fetch the source data. When the Ebox accesses an MD 
register with its fault bit set in S3, it carries that fault status down the pipeline into S4. 

All faults detected in S3 are piped to S4 before they cause a microtrap. Faults detected in S4 or 
piped to S4 will cause a microtrap only if the Ebox is next to retire a macroinstruction. Otherwise 
they are delayed until the Fbox retires an instruction and the retire queue entry indicates the 
Ebox. 

Fault status signals are sent by the Ibox for entries in the instruction queue, source queue, field 
queue, destination queue, and branch queue. Entries in the PA queue have fault bits. The Ebox 
detects a fault when it accesses a PA queue entry with its fault bit set or when it finds the 
instruction queue, source queue, field queue, destination queue, or branch queue empty and one 
of the fault status signals from the Ibox asserted. In the case of the instruction queue, the fault is 
detected in S2 and carried into S3 only when there is no S3 stall. In the case of the source queue 
and field queue, the faults are detected in S3. Instruction queue, source queue, and field queue 
related faults are carried down the pipeline until they reach S4, where they cause a microtrap 
once the Ebox is next to retire a macroinstruction. 

Faults encountered in Ebox-initiated memory operations cause the Microsequencer to trap im- 
mediately. Ebox memory accesses begin in S5 so these traps cannot affect microwords from 
preceding VAX instructions. It is up to microcode to make sure that the last Ebox memory access 
has completed properly before the Microsequencer dispatches to another VAX instruction execution 
microflow. 

Hardware errors are essentially handled in the same way as faults. See Section 8.5.19. 
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8.5 Ebox Detailed Functional Description 
8.5.1 Register File 

The register file has 4 distinct groups of registers: MD (memory data), GPR, Wn (working registers), 
and CPUSTATE registers. There are a total of 37 registers in the file. There are 6 ports: 3 read 
ports and 3 write ports. The read ports are the A port, the B port, and the IA port. The write 
ports are the W port, the IW port, and the MD port. The result is UNPREDICTABLE if more than 
one write to the same location occurs at the same time. Section 8.5.1.4 explains why this never 
happens. 

8.5.1 .1 Register Groups 

The MD registers are only written by the Ibox directly or by the Mbox in completing an Ibox- 
initiated memory read. They are only read by the Ebox, and only accessed using a pointer from 
the source queue. There are 6 MD registers, MD0-MD5. 

The GPRs are all of the VAX general purpose registers, except R15 (PC). These are read and written 
by the Ebox in the course of instruction execution. The Mbox writes them to complete an Ebox- 
initiated memory read. The Ibox also reads and writes them. It reads them as it processes 
operand specifiers which use a GPR in an address calculation. It writes them as it processes 
autoincrement and autodecrement operand specifiers, and in unwinding the RLOG. There are 15 
GPRs, R0-R14 (R14 is often referred to as SP). 

Writes to GPRs can depend on the DL (data length) register. If the L field of the microword which 
caused the write specifies LONG , the full longword is written. If the microword specifies ULEMDL), 
onlj r the appropriate bytes are written. The following table shows which bytes are written in all 
cases. 



Table 8-2: GPR Write Length 



DL Register 


L Field of Microword 


3 


2 


Write Byte ? 
1 


0 


X 


LONG 


Y 


Y 


Y 


Y 


BYTE 


LENflDL) 


N 


N 


N 


Y 


WORD 


LEN(DL) 


N 


N 


Y 


Y 


LONGWORD 


LEN(DL) 


Y 


Y 


Y 


Y 


QUADWORD 


LEN(DL) 


Y 


Y 


Y 


Y 



X means don't care 



The Wn registers are used by microcode for temporary storage and to receive memory read data. 
They are only read by the Ebox using the A or B fields of the microword. They can be written by 
the Ebox, Mbox, or Ibox. The Mbox writes them in completing an Ebox memory operation. The 
Ibox only writes them when completing an Ebox-initiated read of an Ibox DPR. There are 6 Wn 
registers, W0-W5. 
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The CPUSTATE registers are used by the microcode to hold elements of architectural state. They 
are read and written only by the Ebox. There are 10 CPUSTATE registers: KSP, ESP, SSP, XJSP, ISP, 
ASTLVL, SCBB, PCBB, SAVEPC, and SAVEPSL. 

8.5.1.2 Access Ports 

The A port and B port of the register file are read ports which can supply data to E_BUS%ABUS_ 
L<3i:0> and E_BUS%BBUS_L<31:0>, respectively. These two ports are accessed in S3. The address 
can be supplied directly from the A and B fields of the microword or indirectly through the source 
queue. Source queue addressing is specified in the A and/or B microword fields. The A port can 
read any register in the file; the B port can read any register in the file except a CPUSTATE. 

The W port is the write port connected to E_BUS%WBUS_L<3 iiO> . It receives a result from the Ebox 
or Fbox in S5. It can write to the GPRs, CPUSTATEs, and Wn registers. The address can be supplied 
directly by the microword in the DST field or (for GPRs only) indirectly through the destination 
queue. Destination queue addressing is used when the microword specifies DST/DST or when the 
Fbox writes a result to a GPR. 

NOTE 

When the Ebox initiates a memory read by sending a request to the Mbox, it specifies 
the register which will receive the memory data in the DST field of the microword. 
This has the sides effect, when the microword is in So, of writing that register with 
the value on E_BUS9£WBUS_L<3 1 Kh> . Normally this register is written by the Mbox after 
this, before the particular register is read again. However, an exception can prevent 
the Mbox write and leave the register containing effectively garbage data. 

The LA port is a read port used by the Ibox to read GPRs for use in general address calculation and 
for autoincrement and autodecrement operand specifier processing. It can only read the GPRs. 
The address is supplied by the Ibox. 

The IW port is a write port used by the Ibox. It can write to the GPRs, the MD registers, and the 
Wn registers. The Ibox writes GPRs when it processes autoincrement and autodecrement operand 
specifiers and when unwinding the RLOG. It writes MD registers when operand specifier decoding 
requires passing a value (such as an address) to the Ebox. The Ibox writes the Wn registers only 
when responding to an Ebox-initiated EPR read. The address is supplied by the Ibox. 

The MD port is used by the Mbox to write memory or IPR read data into Wn registers, MD registers, 
and GPRs. The Mbox writes MD registers to complete Ibox-initiated reads. It writes Wn registers 
or GPRs to complete Ebox-initiated reads. The register file address is supplied by the Mbox. (The 
Mbox received the register file address when the memory operation was initiated.) 

8.5.1.3 Register File Bypass Paths 

The Ebox implements bypass for data being written into the register file or scheduled to be written 
into the register file further down the pipeline. Two techniques are employed: actual bypass 
datapaths and flow-thru bypass. Actual bypass paths are datapaths and drivers which directly 
drive the data onto E.BUS%ABUS_L<3 lKh> or E_BUS%BBUS_L<31K» . The register file E_BUS%ABUS_ 
L<31:©> or E_BUS%BBUS_L<31K>> drivers are automatically disabled when bypassed data is driven. 
Flow-thru bypass is the technique in which a write to the register file occurs early in the cycle, 
well before the read. This way, reads see the result of writes which occur in the same cycle. 
This technique can only be used when the write data is available early enough and is scheduled 
to be written in that cycle. (For example, bypass of S4 Ebox results to E_BUS%ABUS_L<3 1 :0> 
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or E_BUS%BBUS_L<31:0> can't be done with flow-thru bypass because the register file write isn't 
supposed to happen until S5.) 

See Section 8.5.8 for a description of bypassing of Ebox or Fbox result data from S4 or S5. 

The register file has actual bypass paths for bypassing IW port writes to E_BUS%ABUS_L<31K>> and 
E_BUS%BBUS_L<3 1 K>> . The IW port write occurs too late in the cycle for flow- thru bypass to be 
used. 

NOTE 

IW port bypass is necessary for the NVAX CPU to correctly handle some sequences of 
operand specifier decoding. Here is one example. (To understand this example, the 
reader may need to know things which haven't been explained before this point in this 
specification.) Assume the CPU has to execute the following sequence of macroinstruc- 
tions: 

ADDL2 R0,(R0)+ 
ADDL2 R0,R1 

If the Ibox is executing far enough ahead of the Ebox and the read of memory data 
at (R0) takes a long time (as it would if it neither the Pcache nor the Bcache contains 
the data), then at some point the Ebox is stalled waiting for that data to arrive in an 
MD and the source and destination queues contain all the entries generated by the two 
ADDL2 instructions. The Ebox microword which executes ADDL2 is: 

A/Sl, B/S2. ALU/APLUS.B, ULEKCDL), MISC/LOAD J>SL.CC.nn, SEQ^rUXyLAST.CYCLE.O\ 7 ERFLOW 

In S3 this microword accesses the first two entries in the source queue, which in this 
case point to R0 and some MD. The microword is stalled waiting for the memory read to 
complete (and the MD to become valid). The Ibox complex specifier unit (CSU) is stalled 
by the scoreboard unit (SBU) because it is just about to write RO+4 into the register file. 
For the Ebox must see the old value when it reads R0, the Ibox write to R0 must be 
stalled. Once the Ebox retires the source queue entry containing the pointer to R0, the 
Ibox knows it can write R0. 

In cycle N the memory data arrives and is written into the MD. This ends the S3 stall in 
the Ebox. The very next microword to enter S3 (in cycle N+l) is for the second ADDL2. 
It reads RO and Rl, and must see the new (incremented) value of R0. 

In cycle N+l, the Ebox signals the Ibox of two source queue retires, the Ibox SBU ends 
the CSU's stall, and the CSU writes RO+4 on the IW port. The Ebox reads RO in that cycle 
and, because of the IW port bypass, it sees the correct (autoincremented) value of RO. 

When processing an autoincrement or autodecrement specifier for an address access type operand 
specifier, the Ibox does two sequential writes into the register file. The first writes the address 
into an MD register, the second writes the incremented or decremented register value back into 
the register. In some cases this can cause the Ebox to attempt to bypass both from the output of 
the RMUX in S4 and from the IW port to either or both of E_BUS%ABUS_L<3 i»o> and E_BUS%BBUS_ 
L<3iK>>. In these cases the bypass from the output of the RMUX overrides the IW bypass. See 
Section 8.5.8 for more on bypass from the output of the RMUX. 
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8.5.1.4 Write Collisions 

The result is UNPREDICTABLE if more than one write to the same register file location occurs at the 
same time. Tb prevent this, writes to registers are controlled by certain hardware and microcode 
mechanisms. 

The MD registers can only be written by the Ibox or the Mbox. The Ibox complex specifier unit 
has hardware which allocates and deallocates MD registers. The Mbox writes an MD only when 
returning data for an Ibox-initiated operand data read, and it writes to the particular MD specified 
by the Ibox. The Ibox writes an MD directly only when it knows that no outstanding reads to the 
same MD exist in the Mbox. Therefore, The Mbox and Ibox will never write an MD at the same 
time. 

The GPRs can be written by the Ebox, Ibox, and Mbox. In many typical instruction execution 
situations, the Ebox never writes a GPR explicitly. It only writes them through destination queue 
accesses. The Ibox only writes GPRs to process autoincrement or autodecrement operand speci- 
fiers, so it always reads a given GPR prior to writing it. The Ibox scoreboard unit keeps track of 
which GPRs have been entered into the destination queue and allows Ibox complex specifier unit 
reads only when there are no Ebox writes outstanding. This means the Ibox will never write a 
GPR at the same time as the Ebox. 

When execution of a particular macroinstruction requires the Ebox to directly write GPRs, the Ibox 
is always stopped (the Ibox stops itself after processing the macroinstruction's operand specifiers). 
In these cases, microcode can write to any GPR without colliding with an Ibox write. The Mbox 
only writes a GPR when returning data for an Ebox-initiated Mbox operation. Microcode doesn't 
issue such a memory read unless it knows the Ibox is stopped, and microcode doesn't write the 
GPR while such an operation is outstanding. 

When unwinding the RLOG, the Ibox may write GPRs. The Ebox microcode knows this may be 
happening because the unwind was either initiated under microcode control or as a result of a 
branch mispredict. In either case the Ebox microcode doesn't write GPRs while the unwind is 
occurring. 

The Ebox, Ibox, and the Mbox can write the Wn registers. The Mbox only writes a Wn register 
when returning read data for an Ebox-initiated Mbox operation. The Ibox only writes Wn registers 
to return 1PR data at the Ebox's request. Microcode never writes a Wn if there is an Mbox or Ibox 
operation outstanding which will write the same register. 

Only the Ebox can write CPUSTATE registers, so there is no possibility of a write collision on those 
registers. 

8.5.1.5 Valid, Fault, and Error Bits 

Some of the registers in the register file have valid bits and/or fault and error bits associated with 
them. There is one valid bit, one fault bit, and one error bit associated with each MD register. 
The Wn registers each have a valid bit, but no fault or error bits. 

Valid bits are used to allow synchronization with memory reads. Whenever a memory read to 
a Wn register is initiated, the associated valid bit is cleared. The valid bit for an MD register 
is cleared as a side effect of reading it, so it is already cleared when a memory read to it is 
initiated. (The MD valid bits are also cleared in exception cases, by MISC/RESET.CPU.) When the 
Mbox supplies the data, the valid bit is set. If the microword in S3 reads from an MD or Wn 
register whose valid bit is not set, the pipeline stalls in S3. 
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Fault and error bits are used to indicate that some sort of exception occurred with the memory 
read. Fault bits indicate memory management exceptions, while error bits indicate hardware 
errors. When the microword in S3 reads an MD register whose fault or error bit is set, a microtrap 
is scheduled for this microword. The microtrap is delayed in the pipeline as is discussed in 
Section 8.5.19. Fault and error bits are needed to delay Ebox detection of memory exceptions 
until the Ebox is processing the associated VAX instruction. A set fault or error bit indicates an 
Ibox or Mbox detected exception condition related to source operand specifier processing. If the 
Mbox was unable to complete an Ibox-initiated memory operation targeted to MD, it sets the fault 
or error bit. If the Ibox encountered any sort of fault or error before initiating the final memory 
read necessary to process an operand specifier, it sets the fault or error bit directly. In either case 
the Ebox will not detect the fault until it is executing the associated VAX instruction. There is no 
need for Wn register fault bits because microtraps due to Ebox memory reads are taken as soon 
as they are reported by the Mbox. 

All the Wn register valid bits are set unconditionally in S3 of each new macroinstruction execution 
microflow. The Microsequencer signals the Ebox at start of these microflows. This is done to 
prevent errors from causing the pipeline to stall waiting for a condition which will never be true. 
If an error causes an Ebox memory read to a particular Wn register to fail to complete it leaves 
the valid bit cleared. If a new microflow references the same working register, it will stall. Since 
the memory operation will never complete, the stall will never end. 

All Wn register valid bits are set unconditionally when the MISC field of the microword specifies 
RESET.CPU. 

Wn register valid bits are normally set. A Wn register's valid- bit is cleared in S4 if the microword 
specifies a memory read which will deliver data to that register. The bit is set when the Mbox or 
Ibox writes to that register. It is not altered by Ebox (A, B, or W port) accesses. The S4 clear of a 
Wn valid bit will cause the current S3 microword to stall if it references Wn. 

All the MD valid bits are cleared when the microword MISC field specifies RESET.CPU. MD valid bits 
are not normally set. In normal operation, an MD register's valid bit is set when the Mbox or 
Ibox writes that register, and is cleared as a side effect of the Ebox reading the register. 

8.5.2 Constant Generation 

There are two constant generators, an extremely simple E_BUS%ABUS_L<3 1 :0> constant source 
and a more complicated E_BUS%BBUS_L<31K>> source. The E_BUS%ABUS_L<3 1:0> constant source is 
specified in the A field of the microword. It can produce the following longword constants: 0, 1. 
To source these constants to E_BUS%ABUS_L<31K>>, the microword specifies K0 or Kl„ respectively, 
in the A field. 

The E_BUS%BBUS_L<31:0> constant generator builds a longword constant by placing a byte value 
in one of the four byte positions in the longword. The POS and CONST fields of the microword 
specify the value. The CONST field contains a byte value, while the POS field specifies the byte 
in the longword in which the value appears. The other bytes are zero. It is as if the POS field 
specified a left shift with zero fill of the CONST value. 

The POS and CONST fields are part of the constant generation variant of the microword. In this 
variant the VAL and B fields of the standard format microword, or the MISC2, DISABLE.RETERE, 
and B fields of the special format, are replaced by the POS and CONST fields. In the constant 
generation variant, E_BUS%BBUS_L<31:0> receives the constant so the B field is unnecessary. Also, 
the shifter uses the SC register for the shift amount so the VAL field is not needed (put another 
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way, VAL/0 is effectively specified by the constant generation variant). Similarly, MISC2/NOP and 
DISABLE JRETTRE/NO are effectively specified by constant generation variant microwords. 

Under control of the MISC field, the E_BUS%BBUS_L<31:0> constant generator can also provide a 
constant in which the low order 10 bits are specified by microcode and the high order 22 bits are 
all zero. This mode of constant generation occurs when the MISC field specifies CONST.10.BIT. In this 
case the 10 bit constant is sourced from the CONST.10 field of the microword. (The CONST.10 field is 
formed by concatenating the two-bit POS field with the 8-bit CONST field, with the POS field more 
significant.) The microword format must be the constant generation variant, if MISC/CONST.10.BIT 
is specified. 

The E_BUS%BBUS_L<3l.-0> constant generator can also provide the constant 0000FFFF#16. It is 
produced when the B field of the microword specifies K.FPPF. 

8.5.3 The ALU 

The ALU is a 32-bit function unit capable of arithmetic and logical operations. Its inputs are E_ 
BUS<TcABUS_L<31:0> and E_BUS%BBUS_L<31:0>. Its output drives E_ALU%RESULT_H<31 K» which can 
be muxed onto E_BUS?cWBUS_L<3i:0> and is directly connected to the VA register (see Section 8.5.6). 
It also produces condition codes (ALU<C>, ALU<N>, ALU<V>, ALU<Z>) based on the results of its 
operation. The ALU condition codes are data length dependent, with the data length coming from 
the DL register or defaulting to longword depending on the microword L field. The ALU operation 
is specified by the ALU field of the microword. 

The following table shows the ALU operations by name, and gives a description of each operation. 



Table 8-3: ALU Operations 



ALU Operation Name 



Operation Description 



PASS.A 

PASS.B 

AAND.B 

AAND.NOT.B 

A.OR.B 

A-XOR.B 

NOTAAND.B 

A.PLUS.1 

A.PLUS.B 

A. PLUS.B.PLUS.1 

B. MINUSA 
A.MINUS.B 
A-MINUS.B.MINUS.l 
A.MINUS.1 
A.PLUS.4 
AMTNUS.4 



E_ALU%RESULT_H <- A 
E_ALU%RESULT_H <- B 
E_ALU%RESULT_H <- A AND. B 
E_ALU%RESULT_H <- A AND. (.NOT. B) 
E_ALU%RESULT_H <- A .OR. B 
E_ALU%RESULT_H <- A XOR. B 
E_ALU%RESULT_H <- (.NOT. A) AND B 
E_ALU%RESULT_H <- A + 1 
E_ALU%RESULT_H <- A + B 
E_ALU%RESULT_H <- A + B + 1 
E_ALU%RESULT_H <- B - A = B + (.NOT. A) + 1 
E ALU%RESULT_H <- A - B = A + (.NOT. B) + 1 
E_ALU%RESULT_H <- A - B - 1 = A + (.NOT. B) 
E_ALU%RESULT_H <- A - 1 
E_ALU%RESULT_H <- A + 4 
E_ALU%RESULT_H <- A - 4 
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Table 8-3 (Cont.): ALU Operations 



ALU Operation Name 



Operation Description 



NEG.B 
NOT.B 
SMUL.STEP 
UDIV.STEP 



E_ALU%RESULT_H < — B (minus B) 
E_ALU%RESULT_H <- .NOT. B (ones complement of B) 
E_ALU%RESULT_H <- A .SMUL. B (Q register is affected, see text) 
E_ALU%RESULT_H <- A .UDIV. B (Q register is affected, see text) 



The following signals are used in functional descriptions below: 

• E_ALU%RESULT_H<N> is the nth bit of the ALU result. 

• E_ALU%CI_H<N> is the nth carry-in bit in the ALU. It is the carry into the nth bit slice. The 
carry-in to the ALU is E_ALU%CLH<o>, while the carry out for longword data length is E_ 
ALU<£CI_H<32>. 



8.5.3.1 ALU Condition Codes 

The four condition codes calculated by the ALU are: 

• ALU <V>— Integer Overflow 

This bit indicates an integer overflow from the operation. It is the XOR of the carry in to the 
most significant bit with the carry out of the same bit. The calculation depends on the data 
length in effect for the operation. It is E_ALU%CIJH<N> XOR. E_ALUSeCI_H<N+l> where n is 7, 
15, or 31 for byte, word, or longword data length, respectively. 

• ALU<C>— Carry Out 

This bit is the carry out from the operation. It is E_ALU%CL_H<8>, E_ALU%CLH<16>, or E_ 
ALU%CI_H<32> for byte, word, or longword data length, respectively. 

• ALU<Z>— Zero 

This bit indicates that the ALU result was zero. It is the logical NOR of 
E_ALU%RESULT_H<7K)>, E_ALU%RESULT_H<15s»>, or E_ALU%RESULT_H<31K)> for byte, word, or 
longword data length, respectively. 

• ALU<N>— Negative 

This bit indicates that the ALU result was negative. It is simply E_ALU%RESULT_H<7> , 
E_ALU%RESULT_H<15>, or E_ALU%RESULT_H<3 1> for byte, word, or longword data lengths, 
respectively, length, respectively. 

For logical and PASS operations the ALU<C> and ALU<V> condition code bits are always zero. 

The ALU condition codes are available on the microtest bus and can be used to update the PSL. 
If the microword following the one setting the ALU condition codes is stalled, the Ebox control 
logic holds the ALU condition code bits constant until the microword branching on them is ready 
to use them. The effect is the same as if no stall had occurred. See Section 8.5.14 and Chapter 9 
for more about the microtest bus and see Section 8.5.5 and Section 8.5.10.1 for more detail on 
setting PSL condition code bits. 

If the ALU operation is SMUL or UDIV, the ALU condition codes correspond to the ALU result before 
the one-bit shift is done on the result. 
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8.5.3.2 SMUL Step Definition 

The signed multiplication step is used to implement the sequential add and shift multiplication 
algorithm. It allows microcode to implement byte, word, and longword multiplication of two 
operands. The SMUL step uses the single bit left or right shifter at the output of the ALU, the Q 
register, and two microcode working registers. 

The operation of a single SMUL step is described in Figure 8-2. The proper number of SMUL steps 
is controlled by the microcode and depends upon the data length of the operation. 

The SMUL step operation selects the ALU operation (either PASSA or A.PLUS.B) based on the least 
significant bit of the Q register. However the Q register must not have been loaded by the previous 
microword unless that microword specified an SMUL step. This is because that bit of the Q register 
is not ready in time to control the ALU operation if the Q register was loaded from the output of 
the shifter in the previous cycle. 

Figure 8-2: SMUL Step Operation 



\'Z' » multiplicand 

Operaticr:: Ecr vva < — wa .Sl-m. Kb 
O If Q<0> - 1 

ELSE £~ALU%R£SULT~H<31:0> < — Wa (Partial Product) 

o WBUS<31:0> < — (E_ALU%P.£SULT_H<31> .XOR. E_ALU%CI_H<31> .XOR. E_ALU%CI_H<32» ' E_ALU%R£SULT_H<31 :1: 

© Q<31:0> <— E_ALU%R£SULT_H<0>' ' Q<31:1> 

At end: 

Wa ' 0 " product 

NOTE: E_ALU%RESULT_H is the value of the ALU before the single-bit shift. 

. Description: The lsb of the Q register is tested for a 0 or 1. If Q<0> EQL 0, then 

the partial product is passed through the ALU unmodified. If Q<0> EQL 1, then 
partial product and the multiplicand are added together. Then the output of the 
ALU and the Q register is shifted right one bit. The shift into the msb of WBUS is 
the exclusive-or of the ALU's output sign and the arithmetic overflow out of the ALU 
(arithmetic overflow is the exclusive-or of the carry-in and carry-out of the msb) . 
The shift into the msb of Q comes from E_ALU%RESULT H<0>. 



8.5.3.3 UDIV Step Definition 

The unsigned division step is used to implement the sequential shift and subtract non-restoring 
division algorithm. It allows microcode to implement byte, word, and longword division of two 
operands, and to produce the remainder. The UDIV step uses the single bit left or right shifter at 
the output of the ALU, the Q register, and two microcode working registers. 
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The operation of a single UDIV step is described in Figure 8-3. The proper number of UDIV steps 
is controlled by the microcode and depends upon the data length of the operation. The unsigned 
divide algorithm using the UDIV step requires microcode to shift the remainder one bit to the 
right after the final UDIV step. 

Figure 8-3: UDIV Step Operation 



Note that non-restoring division use the fact that 

2 * (Partial Remainder - Divisor + Divisor) - Divisor « 

2 * (Partial Remainder - Divisor) + Divisor 

At start: 

Q register - dividend 
Kb - divisor 

Wa - 0 (except during an extended divide when 

Ka contains the high-order longword of 
the dividend) 

Operation: For Wa < — Wa .oDIV. Kb 

This operation results in the Q register containing the quotient and 
Via containing the remainder. 

CHEN i_A!U%SSSUir_E <-- Wa - Kb (Partial Remainder /Quotient - Divisor) 

o K3~S<21:?> < — E_ALT7%?.ESi.~ T_K<3 0 : 0> ' C<31> 

o £<31:C> <— Q<30:0> ' ( .NOT .S_AL7%S£SnT_E<31>) 

o ALU_CC.C < — E_ALU%CI_H<22> 

At end: 

Q register - quotient 
Wa - remainder 



NOTE: E_ALU%RESUL?_H is the value of the ALU before the single-bit shift. 

Description: ALU_CC.C is tested for a 0 or 1. If ALTJ__CC.C EQL 1, then Kb is subtracted from Wa. 

If ALU_CC.C EQL 0, then the Ka and Kb are added together. The output of the ALU is 
then rotated to the left one-bit and driven onto the WBUS with WBTJS<0> being driven 
by Q<31>. Additionally, the Q register is rotated left one bit with the 
complement of the bit shifted out of the ALU result becoming Q<0> . The new 
ALD_CC.C condition flag comes from the carry out of the ALU (or £_AI.T7%CI_H<32> here) . 



8.5.4 The Shifter 

The shifter is a right shift network with 64-bits of input and 32-bits of output. The input 
is E_BUS%ABUS_L<3lsO> and E_BUS%BBUS_L<3itO> concatenated to form a 64-bit word with 
E_BUS%ABUS_L<3 1 i0> in the more significant longword. The output is E_SHF%SHF_RESULT_H<3 1Kb- 
which can be muxed onto E_BUS%WBUS_L<31:0> and is directly connected to the Q register (see 
Section 8.5.7). 

The shifter produces two condition code bits, SHP<N> and SHF<Z>. These are available on the 
microtest bus and can be used to update the PSL. See Chapter 9 for more about the microtest bus 
and see Section 8.5.5 and Section 8.5.10.1 for more detail on setting PSL condition code bits. 
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The shifter shifts its input right by 0 to 32 bits. A shift amount of 0 selects the 
E_BUS%BBUS_L<31K>> and a shift amount of 32 selects E_BUS%ABUS_L<31:0> . The equivalent of 
a left shift of N is accomplished by shifting left justified data (32-N) to the right. 

The shift operation is specified in the SHF field of the microword. The following table shows the 
shifter operations by name and gives a description of each operation. If the microword is in the 
special format, the shifter function defaults to NOP since the SHF field is not present. 



Table 8-4: Shifter Operations 


Shifter Operation Name 


Operation Description 






NOP 


E_SHF%SHF_RESULT_H 


<— 


UNPREDICTABLE 


PASS.A 


E_SHF%SHF_RESULT_H 


<- 


A 


PASS.B 


E_SHF%SHF_RESULT_H 


<— 


B 


PASS.Z 


E_SHF%SHF_RESULT_H 


<— 


0 


LEFT.DOUBLE 


E_SHF%SHF_RESULT_H 
count) 


<— 


AB rsh 32 - count (the effect is LSH 


LEFT.SINGLE 


E_SHF% SHF_RE SULT_H 
count) 


<— 


AO rsh 32 - count (the effect is LSH 


RIGHT.DOUBLE 


E„SHF%SHF_RESULT_H 


<— 


AB rsh count 


RIGHT.SINGLE 


E_SHF%SHF_RESULT_H 


<— 


O'B rsh count 



' is the bitwise concatenation operator. 



For the 
SHF/LEFT.SINGLE and SHF/RIGHT.SINGLE operations the shifter masks off E_BUS%BBUS_L<3 1 Kh> or 
E_BUS%ABUS_L<31:0>, respectively. This guarantees that the bits shifted into the result are 0. 

The shift amount comes from the VAL field of the microword or from the SC register. The SC 
register is the source of the shift amount if the VAL field is 0 or if the VAL field is not present 
because the microword is in the constant generation variant format. 

The SC register can specify an actual shift amount in the range of 0 to 31, and the VAL field can 
specify a shift amount of 1 to 31 (0 in VAL implies SC contains the shift amount). 

Neither the SC nor the VAL field can specify a shift of 32. However, since the SHF/LEFT.SINGLE and 
SHF/LEFT.DOUBLE operations differ from the corresponding right shift operations only in that the 
actual shift amount is the amount in the SC register or VAL field subtracted from 32 (32-N), the 
shifter shifts right by 32 when a left shift of 0 is specified. 

8.5.4.1 Shifter Condition Codes 

The shifter condition codes are not dependent on the instruction data length. They are calculated 
always for longword data length. The two condition codes calculated by the shifter are: 

• SHF<Z> - Zero 

This bit indicates that the shifter result was zero. It is the logical NOR of 
E_SHF%SHF_RESULT_H<3 1K)> . 
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• SHF<N> - Negative 

This bit indicates that the shifter result was negative. It is simply E_SHF%SHF_RESULT_H<31>. 

The shifter condition codes are available on the microtest bus and can be used to update the 
PSL. If the microword following the one setting the shifter condition codes is stalled, the Ebox 
control logic holds the shifter condition code bits constant until the microword branching on them 
is ready to use them. The effect is the same as if no stall had occurred. See Section 8.5.14 and 
Chapter 9 for more about the microtest bus and see Section 8.5.5 and Section 8.5.10.1 for more 
detail on setting PSL condition code bits. 

8.5.4.2 Shifter Sign 

The shifter sign, SHP<N>, is saved after each shifter operation including pass operations. A 
constant based on this saved value is available as an input to E_BUS%ABUS_L<31:0>. It is accessed 
by specifying SHIFT-SIGN in the A field of the microword. The constant is 0 or FFFFFFFF#16 for 
Saved-SHF<N> equal 0 or 1, respectively. Saved-SHF<N> is updated after each shifter operation 
and is held in each shifter NOP cycle. If microword N specifies a shifter operation, and 
microword N+l sources this constant, the new value is used to form the constant. However, 
the Saved-SHF<N> may be destroyed by executing a special format microword. The bit is 
UNPREDICTABLE after executing such a microword. 

8.5.5 RMUX and E_BUS%WBUSJ- 

The RMUX coordinates Fbox and Ebox result storage and macroinstruction retiring. It is a 
large selector which selects the source of Ebox memory requests and the source of the next 
E_BUS%WBUS_L<3 1 Kb- data and associated information. The RMUX selection takes place in S4, as 
does the driving of the memory request to the Mbox. The new E_BUS%WBUS_L<3 1 :0> data is not 
used until S5. 

The RMUX is controlled by the retire queue. See Section 8.5.15.7 for detail on the retire queue. 
The retire queue output is a status which indicates whether the next macroinstruction to retire 
is being executed in the Ebox or the Fbox. Based on this status, the RMUX selects one of the 
two boxes to drive E_BUS%WBUS_L<3l.-o> and to drive the memory request signals. The box not 
selected will stall if it has need to drive E_BUS%WBUS_L<31:0> or memory request signals. The 
retire queue read pointer is not advanced, and therefore the RMUX selection cannot change, until 
the currently selected box indicates that its macroinstruction is to be retired (except that the 
retire queue read pointer is not advanced when MISC l/RETTRE INSTRUCTION is specified). 

NOTE 

The Ebox stalls when the microword does not specify NONE in the DST field and the 
retire queue selects the Fbox. It does not stall if the microword specifies DST/NONE, 
even if the same microword specifies a memory request. This is the reason for the 
microcode restriction that any microword specifying a memory operation must also 
specify DST/WBUS or something other than none in the DST field. See Section 8.5.27.15. 

The source (Ebox or Fbox) indicated by the retire queue is always selected to drive the RMUX. If 
the Ebox is selected, the W field of the microword in S4 selects either the ALU or the shifter as the 
source of the RMUX. (Note that E_BUS%WBUS_L<3 1 Kh» is always driven, even if the Ebox specifies 
DST/NONE.) 
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8.5.5.1 RMUX Produced Memory Request Signals 

The RMUX produced memory request signals are: 

• a memory command, 

• a status indicating a destination queue indirect memory store, 

• a tag giving a register file address in case a memory read is specified, 

• and the data length for the operation. 

This information is processed slightly further in the Ebox's Mbox interface logic to produce a 
memory request about halfway through S4. See Section 8.5.17 for more on Ebox memory requests. 

The only memory operation the Fbox can initiate is a destination queue indirect store (a memory 
store). If the Fbox is selected as the RMUX source, the memory request information comes from 
the Fbox and the destination queue. The destination queue is only accessed if the Fbox requests 
it. If it does not request a destination queue access, the memory information output by the RMUX 
indicates no operation. The Fbox also provides the data length if there is a store. 

If the Ebox is selected as the RMUX source, the memory request information comes from the 
microword. However, the DST field can cause a memory store request if it specifies a destination 
queue indirect store. The data length is from the DL register unless the microword L field overrides 
it to longword. The register file address for memory reads always comes from the DST field. 

8.5.5.2 RMUX Produced E_BUS%WBUS_L Related information 

E_BUS%WBUS_L<31:0> carries result data from the Ebox and Fbox and is the only path by 
which macroinstruction results are written to memory or registers. The RMUX produced 
E_BUS%WBUS_L<3 1 :0> related information is: 

• the E_BUS^WBUS_L<3 1 K>> (a longword of data), 

• the E_BUS%WBUS_L<3lKh» destination address or specification, 

• the data length associated with EJBUS%WBUS_L<31:0>, 

• the S5 condition codes, 

• and an indication of which condition code map is to be used. 

The above control information is driven into S5 provided there is not an S4 stall. If there is an 
S4 stall, S5 control information specifying no operation is driven into S5 instead. 

If the Fbox is selected, E_BUS%WBUS_L<31K>> data comes from the Fbox. The E_BUS%WBUS_L<3 1&> 
destination address comes from the destination queue. The condition code bits and map 
specification come from the Fbox. The Fbox sets map specification code to specify no change 
of the condition code bits, except in the last cycle of an instruction retire when the map specifier 
specifies a particular condition code update. See Section 8.5.10.1.1 for more detail on condition 
code alteration. 

If the Ebox is selected, E_BUS%WBUS_L<31K>> data comes either from E_ALU%RESULT_H<3 1:0> or 
E_SHF%SHF_RESULT_H<3 1 :0> . The condition codes come from the same source (ALU or shifter). 
Since the shifter only produces N and Z condition code bits, the RMUX substitutes 0 for S5 C and 
v bits if the shifter is selected. The E_BUS%WBUS_L<31:0> destination address comes from the DST 
field of the microword or from the destination queue. The status indicating whether the condition 
code bits are to be updated and the condition code map to be used are both decoded from the MISC 
field of the microword. 
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In S5, E_BUS%WBUSJL<31:0> drives the W port of the register file and is the input to several 
miscellaneous registers in the Ebox. The condition codes and the map are used to update 
the PSL condition code hits if the map and associated status indicate this should happen. 
E_BUS%WBUS_L<3 1:0> is also the source of write data for any memory write request which was 
sent by the Ebox to the Mbox in the previous cycle. In other words E_BUS%WBUS_L<3 1 Kh> , in S5, 
is the source of write data for the memory operation selected by the EMUX in S4. 

In S5, E_BUS%WBUS_L<31K>> is zero extended according to the data length. The data length is 
from the DL register unless the microword L field overrides it to longword. E_BUS%WBUS_L<31:0> 
data is zero extended from the effective data length to longword. 

8.5.6 VA Register 

The 32-bit VA register is the source for the address on all Ebox memory requests, except 
destination queue based stores which use the current PA queue entry for an address. Unlike the 
entry in the PA queue, the VA register address is not yet translated (though it may be a physical 
address). It is a virtual address except when the memory operation doesn't require translation 
(as in IPR references or explicit physical memory references) or when memory management is off. 

The VA register can be used to latch a temporary ALU output value without driving the ALU result 
onto E_BUS^WBUS_L<3 1 :€» . 

The VA register can be loaded onl\* from the output of the ALU, E_ALU%RESULT_H<31:0>. It is loaded 
when the microword V field specifies to load it. The load occurs at the end of S4, even when there 
is an S4 stall. If a given microword specifies a memory operation in the MRQ field and loads the 
VA register, the new VA value will be received by the Mbox with the memory command. For more 
detail on Ebox-initiated memory operations, see Section 8.5.17. 

NOTE 

The address for memory operations is part of the data latched in the EM_LATCH in the 
Mbox. This is why the Ebox can overwrite the VA value during S4 stalls even though 
the stall might be because the EM.LATCH is full. 

The VA register is one of the possible E_BUS%ABUS_L<31:0> sources. The microword specifies VA in 
the A field to use it. 

8.5.7 Q Register 

The 32-bit Q register is closely associated with the shifter. It can be loaded directly from the 
shifter output without driving that data onto E_BUS%WBUS_L<31:0> . Microcode uses it to hold 
temporary data. 

The Q register can only be loaded from the shifter output, E_SHF%SHF_RESULT_H<31K>>. It is 
loaded when the microword Q field specifies to load it. The load occurs at the end of S4, even 
when there is an S4 stall. 

The Q register is one of the possible sources of both E_BUS%ABUS_L<31K)> and E_BUS%BBUS_L<3 1:0> . 
The microword specifies Q in the A or B field to use it. 

The data in the Q register is shifted one bit to the left or right as a side effect of the ALU SMUL.STEP 
and UDIV.STEP operations. The shift is one bit to the left for UDIVSTEP and one bit to the right for 
SMUL.STEP. 
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8.5.8 Bypassing of Results 

The Ebox implements bypass paths for result data from S4 or S5 to E_BUS%ABUS_L<31H)> or 
E_BUS%BBUS_L<3 i.-o> . These paths allow microwords to use any register in the register file as 
a source of E_BUS%ABUS_L<31:0> or E_BUS%BBUS_L<31:0> even if the register has been updated 
by one of the two preceding microwords. The Ebox pipeline reads from the register file in S3, 
operates on the data in S4, and writes the register file in S5. Since adjacent microwords in the 
pipeline could be from entirely different macroinstruction execution microflows, it is necessary 
that the Ebox hardware detect and resolve cases where one microword alters a register and a 
subsequent microword reads that register before it is written. 

NOTE 

The Fbox is one possible source of result data in S4, and any S5 operation may be a 
result store operation from the Fbox piped forward one stage. Bypassing of results 
destined for the register file from S4 or S5 works for Fbox result store operations in 
the Ebox pipeline in the same way as for microcode operations. 

The Ebox monitors the register file addresses on the A and B ports of the register file in S3 
and compares those to the RMUX register file address in S4. Whenever E_BUS9cABUS_L<3 ik>> and 
EJBUS%BBUS_L<31:0> are expecting data that is not yet in the register file, the data is steered 
directly from the output of the RMUX (at the end of S4). 

NOTE 

The bypass path for register file entries from EJBUSSeWBUS_L<3i:0> in So to 
E_BUS%ABUS_L<3 1.-0> or E_BUS%BBUS_L<31."0> is implemented by register file flow-thru 
writes. E_BUS9i>WBUS_L<3 ik>> data is written into the register file early in the cycle and 
read after the write. So reads see the result of writes from the same cycle. 

The S3 A and B port addresses can come from the microword or the source queue. Similarly the 
RMUX address in S4 can come from the microword, the destination queue, or £he Fbox. The W 
port address in S5 has already been determined by the RMUX in the previous cycle. The Ebox 
bypass path control logic compares the final S3 read addresses to the final S4 write addresses and 
enables the appropriate bypass path when there is a match. (As noted above, S5 to S3 register 
file bypass is a flow-thru path.) 

Data length has an effect on bypass operations for GPRs. When a pending GPR write is to less 
than a full longword, only the bytes which are going to be updated are bypassed. The other bytes 
are read from the register file. Effectively, an independent bypass check is made for each of the 
following: byte 0, byte 1, and the upper word. 

In the event that the W port and the RMUX update the same register, the bypass logic chooses the 
RMUX data as the source of E_BUS%ABUS_L<31H)> or E_BUS%BBUS_L<3iK)>. 

NOTE 

Note it would be possible for a value to be constructed of data from the register file, 
the RMUX, and the W port all at once, because of differing data lengths. 

In the event that the IW port (from the Ibox) and the RMUX update the same register, the bypass 
logic chooses the RMUX data as the source of E_BUS%ABUS_L<3 1H>> or E_BUS%BBUS_L<3 1 K>> . See 
Section 8.5.1.3 for more on IW port bypass. 
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The Q and VA registers updates are also effectively bypassed. Microcode can depend upon the 
new data being available to E_BUS%ABUS_L<3 1Kb- and E_BUS%BBUS_L<3 i.-o> when the preceding 
microword updated these registers. However, the Q register contents are not bypassed if the Q 
register was updated by a shift caused by an ALU/SMUL.STEP or ALU/UDIV.STEP ALU operation. 

NOTE 

The bypass mechanisms for the VA and Q registers are based on a now-thru latch 
updated in S4 (and not stalled) rather than actual bypass paths. Neither bypass is 
data length dependent, as writes to these registers always load the entire longword. 

Bypassing for other registers and states in the Ebox generally does not make sense, and therefore 
is not implemented. For example, there is no bypass associated with the INT.SYS register or the 
PSL. 

8.5.9 Result Destinations 

Most of the Ebox result destinations receive their data from EJBUS%WBUS_L<31:0> in 
S5. Destinations specified in the DST field of the microword are updated in S5 from 
E_BUS<^WBUS_L<siKb.. Possible E.BUS^r WBUS_L<3 1 k>> destinations are any register file entry, the 
PSL and SC registers, and the MMGT.MODE and INT.SYS special registers. More detail on the 
miscellaneous registers is given in the next section. 

A number of special capabilities for loading registers are available through the MISC field of the 
microword. 

• The DL (data length) register can be altered in S3, affecting the next microword but not the 
current one. 

• The SC register can be updated directly from E_BUS%ABUS_L<4:0> in S4 (overriding an S5 
update from the preceding microword). 

• The MPU (mask processing unit, see Section 8.5.10.7) can be updated directly from 

E_BUS%BBUS_L<29:16> in S4. 

8.5.10 Miscellaneous Ebox Registers and States 

There are a number of states and registers in the Ebox with special purposes. Some, like the 
DL register, provide control information. Some provide status signals used by Microsequencer 
conditional branches. They also vary in how and when they are loaded. 

8.5.10.1 PSL 

The PSL is the VAX architecture PSL register. Its bits are used in several places within the 
Ebox. The Microsequencer uses a number of the bits to make dispatching decisions. Additionally 
the current mode is used by the Mbox and the IPL level is used by the interrupts section (see 
Chapter 10 for more on interrupts). 

The PSL can be loaded as a longword or byte destination of E_BUS%WBUS_L<31:0> in S5. There are 
two different decodes of the DST microword field which load the PSL, DST/PSL and DST/PSLJBO. The 
first loads the entire PSL. The second loads only the low-order byte of the PSL. 
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8.5.10.1.1 Condition Code Alteration 

The condition code bits of the PSL can be altered independently. This occurs when the MISC field 
of the microword specifies one of the six possible PSL condition code update functions. Condition 
code update also occurs when the Fbox retires a macroinstruction. The update occurs at the 
end of S5. The resulting bits can be used in the next cycle (for example, the second following 
microword can source the PSL). 

The new condition codes are a logic function (called a map) of the current PSL condition codes and 
the new S5 condition codes. The S5 condition codes in any cycle were selected in the previous 
cycle by the RMUX from the shifter, ALU, and Fbox condition codes. The map specifier is an 
output of the RMUX. It is either supplied by the Ebox or the Fbox. The six different condition code 
update functions available through the MISC field of the microword indicate six different maps. 
The Fbox derives its map from the opcode of the macroinstruction it is executing. 

The following tables show all the different condition code alteration maps. Table 8—5 shows the 
microcode specified maps used for macroinstructions executed in the Ebox. Table 8-6 shows the 
maps used for macroinstructions executed in the Fbox. 



Table 8-5: Condition Code Alteration Maps Specified By Microcode 

MISC Field Specification Map Function 



LOAD.PSL.CC.IIIP PSL<N,Z ; V> <- So Condition Codes <N,Z ; V> 

PSL<C> <- PSL<C> (unchanged) 

LOAD.PSL. CC. JIZJ PSL<N> <- S5 Condition Code <N> XOR So Condition Code <V> 

PSL<Z> <- S5 Condition Code <Z> 
PSL<V> <- 0 

PSL<C> <- NOT S5 Condition Code <C> 

LOAD.PSL.CC.im PSL<N,Z,V t C> <- S5 Condition Codes <N,Z,V,C> 

LOAD.PSL.CC.mJ PSL<N,Z,V> <- S5 Condition Codes <N,Z,V> 

PSL<C> <- NOT S5 Condition Code <C> 

LOAD.PSL.CC.IIIP.QUAD PSL<Z> <- PSL<Z> AND S5 Condition Code <Z> 

PSL<N,V> <- SS Condition Codes <N,V> 
PSL<C> <- PSL<C>( unchanged) 

LOAD.PSL. CC.PPJP PSL<V> <- NOT S5 Condition Code <Z> 

PSL<N,Z,C> <- PSI/<N,Z,C>(unchanged) 



Table 8-6: Condition Code Alteration Maps Used By The Fbox 



Map Specifier Value 

0 

1 (used for MOW, MOVD, MOVG) 

2 (used for most floating point 
instructions) 



Map Function 

No change to the PSL condition code bits. 

PSL<N,Z> <- S5 Condition Codes <N,Z> 
PSL<V> <- 0 

PSL<C> <- PSL<C> (unchanged) 

PSL<N,Z> <- S5 Condition Codes <N,Z> 
PSL<V,C> <- 0 
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Table 8-6 (Cont): Condition Code Alteration Maps Used By The Fbox 

Map Specifier Value Map Function 

3 (used for MULL and some convert PSL<N,Z,V> <- S5 Condition Codes <N,Z,V> 
instructions) PSL<C> <— 0 



8.5.1 0.1 .2 Trace and Trace Pending Bits 

When the first microword of a macroinstruction execution microfiow reaches S5, the PSL<T> 
bit is copied into the PSL<TP> bit. (Macroinstruction execution microfiows are distinguished 
from other microfiows by a status bit sent from the Microsequencer. See Section 8.5.14.3.) 
The Microsequencer receives both these bits and causes a trap fault dispatch when necessary. 
The Microsequencer anticipates the setting of PSL<TP> when it dispatches a macroinstruction 
execution microfiow so that it will dispatch to the trace fault handler on the next 
SEQ-MU3MAST.CYCLE or SEQ^fUX^AST.CYCLE.OVERFLOW. (See Section 9.2.3.3.2.) 



8.5.10.2 SC 

The SC register is a 5-bit register which holds a shifter shift amount. The microword can specify 
left and right shifts of the amount in the SC register. A microword specifies this one of two ways. 
If the constant generation variant of the microword is used, the SC register is always the source 
of the shift amount. Also, the SC register is the shift amount source if the microword is not a 
constant generation variant and the "VAL field is zero. 

The SC register can be loaded in two different ways. One way is to specify DST/SC, specifying the 
SC as the destination of E_BUS%WBUS_L<4:0>. The other way is to specify MISC/LOAD.SC JFHOM A. In 
this case the SC register is loaded from E_BUS%ABUS_L<4:0>. 

The E_BUS%WBUS_L<4K)> load into SC occurs at the end of S5. The E_BUS%ABUS_L<4.-0> load occurs 
at the end of S4. In either case, the new value is not seen by the shifter until the next cycle. 
The shifter can use the old SC value during the current cycle. The SC control logic ensures that 
the following case works the same way with and without a stall on the second microword. If 
microword N loads the SC register off E_BUS%WBUS_L<40> , and microword N+l shifts some data 
by the amount in the SC register, the data will be shifted according to the value in SC as microword 
N began. 

If two different microwords each specify a load of the SC in the same cycle, the E_BUS%ABUS_L<3 1 K>> 
data is loaded. This can only happen if one microword specifies DST/SC and the following 
microword specifies MISC/LOAD.SC.FROMA. The more recently executed microword wins. (Note 
that this means the result when a stall delays the second microword is the same as if there is no 
stall.) 

NOTE 

If an Ebox pipeline abort occurs, it does not necessarily prevent the modification of the 
SC register by a microword in the pipeline. If a microword which would alter the SC 
in S5 (i.e., specifies DST/SC) enters S5 in a pipeline abort cycle, the SC is loaded despite 
the abort. Effectively, the SC register is UNPREDICTABLE after a pipeline abort (though 
if a particular case is analyzed carefully, it may be possible to determine that the SC is 
predictable in that case). 
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8.5.10.3 INT.SYS 

INT.SYS is a possible E_BUS%ABUS_L<3 1K>> source and a possible E_BUS%WBUS_L<3 1 :0> destination. 
It is microcode's interface to the interrupt section. Both as a source and as a destination, INT.SYS 
is a longword. For information on the format and use of the register, see Chapter 10. The register 
is read in S3 and written in S5. 

8.5.10.4 MMGT.MODE 

The MMGMODE register is a 2-bit E_BUS%WBUSJL destination. It is loaded from 
E_BUS%WBUS_L<3:2> early in S5. Its value is used in memory management probe accesses (MRQ 
field specifies FROBE.V.RCHK, PROBE.VRCHK.NOPILL or PROBE.V.WCHK). The Ebox drives this mode 
directly to the Mbox. For more detail on Ebox-Mbox interaction see Section 8.5.17. 

8.5.10.5 State Flags 

There are 6 1-bit state flags: 0 through 5. Microcode can conditionally branch on these bits. 
They can be set and cleared by microcode, and some are cleared automatically at the start of 
each macroinstruction execution microflow. The state flags are used as microcode flags for loops 
and shared microcode paths. 

The state bits are maintained in S3. If the state bits are altered in a microword, a branch based 
on the new state may be specified in the next microword. It is possible to set or clear state flags 
and branch on the previous value in the same microword. 

The following table shows the microword fields and specifications used to set and reset state 
flags. 

Table 8-7: Setting and Clearing State Flags 



MESC Field in Standard Fonnat Microword 
Mnemonic Operation 



MISC/CLR.STATE.3-0 


Clear State Flags 0-3 


MISC/SET.STATE.O 


Set State Flag 0 


MISC/SET.STATE.1 


Set State Flag 1 


MISC/SET.STATE.2 


Set State Flag 2 


MISCl Field in Special Format Microword 


Mnemonic 


Operation 


MISC1/CLR.STATE.5-4 


Clear State Flags 4 and 5 


MISCySET.STATE.3 


Set State Flag 3 


MISC1/SET.STATE.4 


Set State Flag 4 


MISC1/SET.STATE.5 


Set State Flag 5 



At the start of each macroinstruction (macroinstruction and FPD dispatches in the 
microsequencer), in S3, state flags 0 through 3 are reset. If the first microword of the 
macroinstruction execution microflow sets any of the state flags, it will override the automatic 
reset for the particular state bit(s) specified; the others are still cleared. 
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The state flag bits may be selected onto the microtest bus for use in microcode branches. See 
Section 8.5.14 and Chapter 9 for more on microcode branches. 

8.5.1 0.5.1 E%MACH!NE_CHECK_H 

If state flag 5 is 1 and state flag 4 is 0, the signal E%MACHINE_CHECK_H is asserted. This causes 
pin P%MACHINE_CHECK W H to be asserted. 

8.5.1 0.5.2 State Flags and Pipeline Abort 

The state flags are maintained in S3. If a microword which specifies to set or clear state flags 
enters S3, the flags are altered. Also, the automatic reset of state flags 0 through 3 at the start of 
a new macroinstruction execution flow occurs when the associated microword is in S3. In either 
of these cases, a pipeline abort (due to a microtrap) in S4 for the associated microword will not 
prevent the state flag modification. When microcode intends that the state flags not be altered 
by a specific flow if it is aborted by a microtrap, special rules must be followed. 

There are two cases. If the anticipated microtrap can only occur with microword N in S3, 
microword N+l can specify an alteration of a state flag and it will not happen if the microtrap 
occurs. If the anticipated microtrap can only occur with microword N in S4, and microword N+l 
alters a state flag, that state flag will be affected even if the microtrap occurs. In this case, 
microword N+2 may alter a state flag and it will not happen if the microtrap occurs. 

If it is not predictable whether microword N will be in S3 or S4 when the anticipated microtrap 
occurs, then the obvious extrapolation of the above explanation determines the result. 

Here is an example case in which microword N is guaranteed to be in S3 when an anticipated 
microtrap occurs: 

• Microcode issued an explicit memory read to a Wn register and microword N sources Wn to 
the E_BUS%ABUS_L<31:0> to synchronize the operation. The anticipated microtrap is associated 
with the memory read to Wn. 

Here are some example cases in which microword N is guaranteed to be in S4 when an anticipated 
microtrap occurs: 

• Microword N sources an MD to the E_BUS%ABUS_L<3 1K)> (through the source queue) to 
synchronize to an operand prefetch issued by the Ibox. The microtrap is associated with 
the operand which is to be returned to the MD. 

• Microword N synchronizes to an explicit memory reference in microword N-l by specifying 
MRQ/SYNC MBOX. The microtrap is associated with the memory reference issued by microword 
N-l. 

Microcode which intends to avoid the side effect in which state flags 0 through 3 are cleared in 
the first cycle of a macroinstruction microflow if a microtrap occurs may have to add a microword 
after the one synchronizing to the anticipated microflow before specifying SEQJMUX/LAST.CYCLE or 
SEQ.MUX/LAST.CYCLE.OVERPLOW. Specifically if microword N synchronizes to an anticipated microtrap 
in S4 and microword N+l specifies SEQ.MUX/LAST.CYCLE, then state flags 0 through 3 will not be 
cleared if the microtrap occurs. However, if microword N specifies SEQ.MUX/LAST. CYCLE, the state 
flags could be cleared (though it would depend on the detailed timing of the events). 
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8.5.10.6 DL Part of the Instruction Context Register 

The DL is one field of the instruction context register. It contains the initial data length for the 
macroinstruction which is being executed in the Ebox. The data length is determined by the Ibox 
and passed to the Microsequencer in the instruction queue. The Microsequencer enters the DL 
into the instruction context register, along with other instruction context information. It is used 
by the Ebox as the default data length for each microword. Each microword specifies use of the 
data length in the DL or use of a data length of longword. The L field of the microword determines 
this. The operations affected by data length are: 

* Calculation of the ALU condition codes. 

The four condition codes are determined according to the data length. (For example, the 
ALU<N> is bit <31>, <15>, or <7> for longword, word, or byte length operations, respectively.) 

• Zero Extending of E_BUS%WBUS_L<3i.-o> data. 

E_BUS%WBUS_L<31:0> data is zero extended from the specified data length to longword. 

* The size of a memory operation initiated by this microwordL 

This affects all memory operations except result stores to the current PA queue entry address. 
(PA queue entries contain the data length used for the store operation.) 

• Register File GPR Writes. 

GPR writes from E_BUS%WBUS_L<3iK>> are gated by the data length such that only the bytes 
in that data length are affected by the write and others are unchanged. (Writes from the MD 
and IW ports to the GPRs are not affected by the DL.) 

The DL field in the instruction context register can be modified by specifying DL.BYTE, DL.WORD, or 
DL.LONG in the MISC field of the microword. The effect is to set the DL to byte, word, or longword 
data length, respectively. The old DL value applies to operations in the current microword. The 
new DL value applies to the next microword. 

See Section 8.5.14.1 for more on the instruction context register. 

8.5.10.7 Mask Processing Unit 

The mask processing unit (MPU) holds and processes a 14-bit value. The value is loaded from 
E_BUS%BBUS_L<29:16> when the microword specifies LOADMPU.FROM3 in the MISC field. The MPU 
outputs a set of bits with which the microcode can carry out an eight-way branch. They are 
MPU0_6<2:0> and MPU7_13<2:0>. The purpose of this is to allow microcode to quickly process bit 
masks in macroinstruction execution microflows for CALLG, CALLS, RET, FFC, FFS, POPR, and PUSHR. 

The MPU unit loads a 14-bit value from E_BUS%BBUS_L<2»:16> when the microword specifies 
it. This occurs in S4. The MPU evaluates the input producing the values on MPUO_6<2:0> 
and MPU7_13<2:0> shown in the table below. MPUO_6<2:0> depends only on mask bits <6:0> and 
MPU7_13<2:0> depends only on mask bits <13:7>. 

Table 8-3: MPU Calculation 

MPU0_6<2:0> Truth Table 

Mask<6:0> MPU0_6<2Kh> 

XXXXXX1 000 



All values shown in binary. X = don't care 
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Table 8-8 (Cont.): MPU Calculation 

MPU0_frc2:0> Truth Table 

Mask<6.-0> MPUO_6<2:0> 

X X X X X 1 0 001 

X X X X 1 0 0 010 

X X X 1 0 0 0 011 

XX10000 100 

X 1 0 0 0 0 0 101 

1 0 0 0 0 0 0 110 

0000000 111 



MPU7_13<2K)> Truth Table 

Mask<13:7> MPU7_13<2:0> 

X X X X X X 1 000 

XXXXX10 001 

X X X X 1 0 0 010 

X X X 1 0 0 0 011 

X X 1 0 0 0 0 100 

X 1 0 0 0 0 0 101 

1 0 0 0 0 0 0 110 

0000000 111 

All values shown in binary. X = don't care 



Microcode can branch on the MPU7_i3<2.-0> or MPU6_0<2:0> values after they are loaded. The initial 
processing is done by the end of the S4 cycle which loaded the MPU. When microcode does branch 
on one of these values, the least significant bit which is 1 in the current mask value in the MPU 
is reset to 0 automatically. This occurs in S3, so that the next microword can branch on the new 
value of the mask. (The MPU bit clear does not occur in a cycle in which there is an S3 stall.) 
The MPU detects that the microword entering S3 specifies an eight-way branch on MPU7_13<2:0> or 
MPU6_0<2-.0> by examining the E_USQ%UTSEL_H<4K>> and E_USQ%UTSEL_L<4K>> bits. If they specify 
a MPU branch, the appropriate bit is reset. 

If a load of a new MPU mask value is simultaneous with a microcode MPU branch, the new data is 
loaded correctly without any side effect due to the branch. This occurs when a microword specifies 
LOAD.MPU.FROM.B and the immediately following microword does a branch on the previous mask 
value. The branch is an S3 operation of the second microword, while at the same time the load 
is an S4 operation of the first. (The branch outcome is guaranteed to reflect the MPU value before 
the load.) 
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8.5.11 Branch Condition Evaluator 



The branch condition evaluator uses the macroinstruction opcode, the ALU condition code bits, 
the PSL condition code bits, and E_SHF9bSHF_RESULT_H<o> to evaluate the branch condition for 
all macroinstruction conditional branches. The evaluation is done in every cycle but is only used 
if the microword specifies SYNC.BDISP.TESTPRED in the MRQ field. The result of the evaluation is 
compared with the Ibox prediction for the branch. The Ibox prediction is indicated in the current 
branch queue entry. If the Ibox prediction was not correct, the Ebox signals the Ibox and sends 
a branch misprediction trap request to the Microsequencer. 

The branch condition evaluation is begun late in S4 and finished early in S5. All the information 
needed to perform the evaluation is gathered late in S4. The PSL condition code bits used in the 
comparison are bypassed; they are the bits which will be latched into the PSL at the end of S4. 
The ALU condition code bits used are generated late in S4 and are dependent on the data length 
for the instruction. The shifter result bit is also generated late in S4. The opcode is available 
early in S4 and is used to set up the evaluation. 

In S5, the result of the branch condition evaluation is compared with the Ibox prediction, and 
E9eBCOND_RETTREJL is asserted to tell the Ibox that a branch queue entry for a conditional branch 
was removed from the branch queue. If the prediction was not correct, the Ebox also asserts 
E%BRANCH_MISPREDICT_L which is received by the Ibox and Microsequencer. The Microsequencer 
forces a branch mispredict microtrap beginning in the next cycle when E%BRANCH_MI3PREDICT_L 
is asserted. If E%BCOND_RETTRfLL is asserted and E%BRANCHJtfISPREDICT_L is not, the Ibox 
releases the resource which is holding the alternate PC (the address which the branch should have 
gone to if the prediction was not correct). If E9eB CONT)_RETIRE_L and E%BRANCH3IISPREDICT_L 
are both asserted, the Ibox begins unwinding the ELOG and fetching instructions from the 
alternate PC. In this case, the microtrap in the Ebox will cause the Ebox and Fbox pipelines to 
be purged and the various Ibox-Ebox queues to be flushed. Also, E%FLUSH_MBOX_H is asserted, 
flushing Mbox processing of Ebox operand accesses other than writes. See Section 8.5.19 for 
more on Ebox handling of microtraps. See Chapter 9 for more on dispatching a microtrap. See 
Chapter 7 for more on activity surrounding branch misprediction. 

The branch macroinstruction has entered S5 and is therefore retired even in the event of a 
misprediction. It is the macromstructions following the branch in the pipeline which must be 
prevented from completing in the event of a misprediction trap. 

The following shows all the cases the branch condition evaluator handles. The macroinstruction 
opcode and mnemonic are given along with the boolean equation used to determine if the branch 
is taken. 



Table 8-9: Branch Condition Evaluation 



Instruction 



Opcode 



Branch Taken Condition 



BNEQ, BNEQU 
BEQL, BEQLU 
BGTR 
BLEQ 
BGEQ 



12 
13 
14 
15 
18 



NOT PSL<Z> 
PSL<Z> 

NOT (PSL<N> OR PSL<Z>) 
PSL<N> OR PSL<Z> 
NOT PSL<N> 
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Table 8-9 (Cont.): Branch Condition Evaluation 



Instruction 


Opcode 


Brancli Taken Condition 


BLSS 


19 


PSL<N> 


BGTRU 


1A 


NOT (PSL<C> OR PSL<Z>) 


BLEQU 


IB 


(PSL<C> OR PSL<Z>) 


BVC 


1C 


NOT PSL<V> 


BVS 


ID 


PSL<V> 


BGEQU, BCC 


IE 


NOT PSL<C> 


BLSSU, BCS 


IF 


PSL<C> 


SOBGEQ 


F4 


NOT ALU<N> 


SOBGTR 


F5 


NOT (ALU<N> OR ALU<Z» 


AOBLSS 


F2 


ALU<N> XOR ALU< V> 


AOBLEQ 


F3 


(ALU<N> XOR ALU<V>) OR ALU<Z> 


ACBB 


9D 


(ALU<N> XOR ALU<V>) OR ALU<Z> 


ACBW 


3D 


(ALU<N> XOR ALU<V>) OR ALU<Z> 


ACBL 


Fl 


( ALU<N> XOR ALU<V>) OR ALU<Z> 


BBS 


E0 


E_SHF*SHF_RESULT_H<0> 


BBC 


El 


NOT E_SHF%SHF_RESULT_H<0> 








BBCS 


E3 


NOT E_SHF%SHF_RESULT_H<0> 


BBSC 


E4 


E_SHF%SHF_RESULT_H<0> 


BBCC 


E5 


NOT E_SHF%SHF_RESULT_H<0> 


BBSSI 


E6 


E_SHF%SHF_RESULT_H<0> 


BBCCI 


E7 


NOT E_SHF%SHFJRESULT_H<0> 


BLBS 


E8 


E_SHF%SHF_RESULT_H<0> 


BLBC 


E9 


NOT E_SHF%SHF_RESULT_H<0> 



8.5.12 Miscellaneous Ebox Operand Sources 

Generally Ebox operand sources are registers in the register file or other registers. Certain 
sources are read type accesses to Ebox states, special results calculated automatically, or access 
to a data path not normally used as an operand source. In some cases data which can be accessed 
in another way is arranged in a special format as a source. 
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8.5.12.1 S+PSW_EX 

The S+PSW_EX E_BUS%ABUS_L<31K>> source is simply a bit from the macroinstruction opcode and 
several bits from the PSL. It saves microcode steps in the CALLS and CALLG macroinstructions. 
Figure 8-4 shows the format of this longword source. 

Bit <29> comes from the instruction context register (OPCODE<0>). Bits <7:5> come from the PSL 
register. 



8.5.12.2 Population Counter 

The Population Counter is an Ebox function unit which calculates four times the number of 
ones in E_BUS^iABUS_L< 13K>> every cycle. Its result is available as a E_BUS<rcABUS_L<3 1 K)> source 
to the following microword. It saves microcode steps in the CALLS, CALLG, POPE, and PUSHR 
macroinstructions. 

The Population Counter calculates a result in the range 0 to 14*4 equal to four times the number 
of ones in E_BUS%ABUS_L<13K>> early in S4. If microword N steers data to EJBUS%ABUSJL<31:0>, 
microword N+l can access the Population Counter result for that data by specifying POP.COTJNT 
in the A field. If microword N+l is stalled in S3, Ebox control logic holds the Population Counter 
result until the stall ends. The effect is the same as if no stall had occurred. 

The Population counter's result is used to calculate the extent of the stack frame which will 
be written by the macroinstruction. The two ends of the stack frame are checked for memory 
management purposes before any writes are done. 



8.5.12.3 RN.MODE.OPCODE 

RN.MODE.OPCODE is a longword composite source used when the microcode needs to access one of 
these data items. The four data fields in this register are RN<3:0>, CUR_MOD<1:0>, OPCODE<7:0>, 
and the VAX_RESTART_BIT. Figure 8-5 shows the position of these fields in the longword. This 
longword is one of the possible E_BUS%BBUS_L<31H>> sources. It is read in S3. 

The RN<3:0> field is really a special data path. Its value is the GPR number in the current source 
queue entry. The following restrictions apply: The A field of the microword must specify Si (the 
current source queue output), and the microcode must know from context that the source queue 
entry points to a GPR. If these restrictions are not met, the value returned in the RN field is 
UNPREDICTABLE. 
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RN.MODE.OPCODE E_BUS%BBUS_L<31:0> Source 
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The CUR_MOD<1:0> field is simply the access mode of the current process; it is taken directly from 
PSL<25:24>. 

The OPCODE<7:0> field is the opcode from the most recent macroinstruction execution dispatch. 
It is taken from the instruction context register in S3. This instruction context register field has 
9 bits. The 9th bit indicates the first byte of the opcode was FD#16. The opcode portion of the 
RNJVIODE.OPCODE source does not include the 9th bit. 

The VAXJRESTARTJBIT field is the VAX Restart Bit which indicates that the most recently dispatched 
macroinstruction execution microfiow has not altered a GPR or initiated a memory write operation 
of some kind. It is used to indicate to the operating system that a macroinstruction which 
encountered some error hasn't modified any architectural state. See Section 8.5.13 for more 
detail. 



8.5.1 2.4 PMFCNT Register 

The PMFCNT register which is part of the performance monitoring facility is available as an 
E_BUS%ABUS_L<3 1 K>> source. See Chapter 18 and Figure 18-4. 

8.5.13 VAX Restart Bit 

The VAX Restart Bit is used to keep track of whether the currently executing macroinstruction 
has altered any architecturally visible state. It is only used by macrocode handling machine 
check exceptions. Conceptually, the Ebox hardware resets this bit anytime a GPR is altered or 
a memory write or store is initiated and sets it anytime a new macroinstruction begins. Often 
there is more than one macroinstruction in the NVAX pipeline, making maintenance of the VAX 
Restart Bit somewhat tricky. 

As is described in Section 8.5.19, microtraps for faults are always taken at the end of S4, before 
the microword can advance to S5. The VAX Restart Bit is set reset only when operations advance 
to S5 and there is no pipeline abort in that cycle. 

The VAX Restart Bit is reset each time a microword which alters a GPR or specifies any memory 
write is advanced into S5. The bit is reset in S5 when a read is sent to the Mbox and the read 
data is to be returned to a GPR, since that event actually writes the data on E_BUS%WBUS_L<31:0> 
into the specified GPR. 

The memory operations specified in the MRQ field which cause the VAX Restart Bit to be reset are: 

• WRITE.V.WCHK and 

• WRITE.V.UNLOCK. 
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In addition, all microwords specifying DST7DST reset the VAX Restart Bit since destination queue 
indirect stores are either memory stores or GPR writes. 

The VAX Restart Bit is set each time a microword which causes dispatch to an execution 
microfiow is advanced into S5, or when microcode handles a trap exception by retiring the current 
instruction and dispatching to the exception handler in microcode. Specifically, it is set when 
MISC/RETniE.INSTRUCTrON or SEQ.MUX/LAST.CYCLE is specified by the microword in S5. The set 
always overrides the reset when both conditions exist in the same cycle. So the bit is reset when 
a microword which alters a GPR or writes or stores to memory is in S5 and that microword does 
not specify MISC/RETIRE.INSTRUCTION or SEQ.MUX/LAST.CYCLE. 

When the Fbox is retiring results, the VAX Restart Bit is maintained properly. It is reset if the 
Fbox stores a result in memory or the register file (that is, it is reset on any destination queue 
indirect store from the Fbox). It is set when the Fbox asserts F%RETIRE_H, retiring the current 
Fbox instruction. (Note that this is not the cycle in which the microword which initiated the Fbox 
instruction is in S4; this is the cycle in which the Fbox sends the result of the operation to the 
Ebox.) As with Ebox retires, the set overrides the reset. 

The VAX Restart Bit doesn't detect all changes to architecturally visible states. Microcode takes 
explicit action when it is about to alter some architecturally visible state other than memory or 
a GPR. It can, for example, copy a GPR to itself before changing the other state in question. 

The VAX Restart Bit is read out in S3 but is maintained in So. The value of this bit isn't useful 
if the pipeline is executing macroinstructions normally. It is useful only when a machine check 
exception has been detected. Since the VAX Restart Bit is updated in mid S5, it won't report a 
memory or GPR write until the second microword after the one which does the write. 

The VAX Restart Bit is read through the RN.MODE.OPCODE E_BUS%BBUS_L<31:0> source. See 
Section 8.5.12.3. 

8.5.14 Ebox-Microsequencer Interface 

The Ebox receives the data path control part of the microword and the macroinstruction context 
information from the Microsequencer at the beginning of S3. It also receives a few signals 
indicating the circumstances accompanying the fetch of the microword. The Ebox sends many 
states which are needed for conditional branches to the Microsequencer from various points in 
the Ebox pipeline. The Microsequencer uses these states for conditional branch calculation. 

8.5.14.1 Instruction Context Register 

The Microsequencer latches macroinstruction information at the beginning of each, 
macroinstruction execution microfiow, including FPD microfiows. This information was originally 
created in the Ibox and entered in the instruction queue. At some point the Microsequencer 
extracted that information along with a control store dispatch address. The Microsequencer 
pipelines this information so that it becomes visible to the Ebox at the same time as the microword 
from the dispatch address is clocked into the MIB Latch. The Microsequencer holds this data 
until the next time the first microword of a macroinstruction enters S3. See Section 9.2.3.3.4 and 
Section 9.2.3.3.4.1. 

Except for the DL data, the Ebox simply carries the instruction context data down the pipeline. In 
the Ebox, the DL register is loaded with the DL data when the first microword of a macroinstruction 
is in S3. This latch can be altered under microcode control. See Section 8.5.10.6. 
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The information passed by the Microsequencer to the Ebox is made up of the following fields: 

* Macroinstruction Opcode; Instruction Context<OPCODE> b 
Instruction Context<12:4> 

The ninth bit indicates FD#16 was the first opcode byte. This data is carried down the Ebox 
pipeline. It is used in S3 as a source of data and for microcode conditional branches. In S4/S5 
it is used in the conditional branch evaluator. 

* Data Length; Instruction Context <DL> s Instruction Context<3:2> 

The Ebox holds this initial instruction data length in the DL register. 

* Fbox Instruction Flag; Instruction Context <FI> s Instruction Context<l> 

This bit is asserted if the opcode is for any macroinstruction which is normally executed in 
the Fbox. The Ebox enters it in the retire queue and uses to force a reserved opcode fault for 
Fbox instructions when the Fbox is disabled. 

The Microsequencer signals that a new microflow begins with the accompanying microword and 
macroinstruction context information. If the new microflow is due to a macroinstruction, the 
Ebox latches the DL<1:0> data. The DL value can be altered by microcode, so a special latch is 
implemented in S3 for it. The opcode is simply carried along the pipeline. It remains latched in 
the Microsequencer until the next new macroinstruction flow is dispatched, so it is not latched 
explicitly in the Ebox. This instruction context information is available to any microword in the 
associated macroinstruction's execution microflow. 

The floating point instruction flag is also entered in the retire queue when a new microflow is for 
a macroinstruction. For more detail on the retire queue see Section 8.5.15.7. 

The macroinstruction context information is carried down the pipeline with each microword. 
The context information stalls when the microword stalls. The opcode is used in S4 and S5 to 
determine conditional branch results. The DL is used to control the ALU in S4, the size for any 
memory request in S4, E_BUS9cWBUS_L<3 1 K>> zero extension in S5, and GPR byte write-enables in 
S5. The floating point instruction flag is used in S3 to determine how to handle source operand 
faults. 

The DL register can be altered by microcode. This occurs when the microword specifying the 
change is in S4. If new instruction context information enters S3 at the same time as a microword 
specified DL alteration occurs, the instruction context load overrides the microword specified 
alteration. This is because the instruction context load is for the microword subsequent to the 
microword specifying the DL alteration. 

8.5.14.2 Microtest Fields 

The Ebox provides most of the information used by the Microsequencer for microcode branches. 
The condition bits are driven onto the microtest bus when the Microsequencer requests it by 
driving the select code on E_USQ%UTSEL_H<4 db> and E_USQ%UTSEL_L<4K)> . The condition data is 
driven early in the cycle after it is computed. The following table shows the information the 
Ebox can supply. It gives the source and pipeline segment in which the data is driven. This 
condition information is tested in S3, as specified by the SEQ.COND field in the Microsequencer 
control part of the microword. The S3 operation determines the address of the next microword. 
So data delivered by the Ebox when microword N is in S3 is used by microword N+l to select 
microword N+2. If the data is driven while microword N is in S4 or S5, one or two more cycles 
of microbranch latency are required, respectively. 
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Table 8-10: Ebox Sourced Microbranch Conditions 



Gnu unn 


Pipeline Stage (condition bit driven at 
enci ox »Mftg<B/« 


ALU<Z>, ALU<C>, ALU<V>, ALU<N> 


S4 


SHF<N>, SHF<Z> 


S4 


K_BUS%ABUS_L<3L16>l-i,7i6> 


S3 


E_BUS<XABUS_L<13> OR E_BUS<S>ABUS_L<12> 


S3 


E_BUS%BBUS_L<&3,2iO> 


S3 


E_BUS%BBUSJL^M>> EQ 0 


S3 


E_BUS%BBUS_L<15A> NEQ 0 


S3 


MPU0_6<2:0>, MPU7_13<2:0> 


S4 


State Flags 0-5 


S3 


Opcode<2:0> 


S2 1 


P3L<29.26:22> 


S5 


VECTOR_PRESENT 


always stable; configuration status bit (not 




used by NVAX microcode, see Section 8.5.18) 


FBOX.ENABLE 


always stable; configuration status bit 


Field queue status - valid, and reg_mode 


always accessible; Ibox-Ebox queue 


Fbox fault code (see Section 8.5.19.7) 


effectively always stable; not valid except in 




microtrap for Fbox faults 



1 bypass or now-thru design required so first xnieroword of a macroinstruction execution flow can specify a conditional 
branch on its macroinstruction opcode. 



See Chapter 9 for more on microbranches. 

8.5.14.3 Miscellaneous Microsequencer Signals 

The Microsequencer provides the Ebox with several control signals. They signal certain 
Microsequencer events which have Ebox side effects. 

The Microsequencer signals E_USQ%UTSEL U H<4K)> and E_USQ%UTSEL_L<4:0> are used in early S3 
by the Ebox to detect that one of the MPU conditional branches (MPU0.6 or MPU7.13) is decoded 
from the Microsequencer control part of the xnieroword. The Ebox clears the appropriate bit in 
the mask stored in the MPU by the end of S3. See Section 8.5.10.7 for more detail. 

The Microsequencer signals E_USQ%UTSEL_H<4:0> and E_USQ%UTSEL_L<4Kh> are used early in 
S3 by the Ebox to detect that the field queue status conditional branch is decoded from the 
Microsequencer control part of the microword. The Ebox retires an entry from the field queue if 
the entry was valid at the time the branch was evaluated. See Section 8.5.15.8 for more detail. 

NOTE 

E_USQ%UTSEL_H<4H)> and E_USQ%UTSEL L .L<4K)> are derived almost directly from the 
SEQ.COND field of the Microsequencer control part of the microword. See Chapter 9. 
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The Microsequencer asserts E_USQ%MACRO_lST_CYCLE_H when the microword in S3 is the 
first microword of a macroinstruction execution microflow (including the microflow at the FPD 
dispatch). The Ebox sets all the Wn register valid hits and resets state nags 0-3 as a result of this 
signal. Both effects occur in S3. It also copies PSL<T> into PSL<TP> once the microword reaches 
S5. Also, the Ebox latches the new instruction context DL value at the beginning of S3. 

The Microsequencer asserts E_USQ%PE_ABORT_.L when a microtrap is initiated. In this cycle all 
the control latches in the Ebox pipeline are flushed. Also, the Ebox flushes the retire queue. 

The Microsequencer asserts E_USQ%IQ_STALL_H when the microword in S2 is the STALL microword 
(see Section 8.5.20.1). This status is carried down the Ebox pipeline along with the microword. 
The status is asserted (and the microword is the STALL microword) only when the Microsequencer 
required an instruction queue entry but no entry was valid. When this status is true, and the 
Ibox is asserting one of its memory error signals, the Ebox assumes a memory error in fetching 
the opcode byte(s) occurred. This is piped forward to S3 and then treated like any other S3 
detected fault. A microtrap is forced when the condition is clocked into S4. See (Section 8.5.19). 
The STALL microword status is also used by the Ebox S3 stall timeout logic (see Section 8.5.25.1). 

Two fields from the Microsequencer control portion of the microword are decoded by the Ebox. 
These fields are SEQ.MUX and SEQ.PMT. The Ebox determines when these fields decode to 
the operation LAST.CYCLE or LAST.CYCLE.OVERFLOW. See Chapter 9 for more on the format of 
the Microsequencer control portion of the microword. The decoded status is carried down 
the Ebox pipeline with the other decodes of the microword. When a microword specifying 
SEQ.MUXLAST.CYCLE or SEQ.MUX/LAST.CYCLE.OVERFLOW is advanced into S5, the Ebox signals the 
Ibox that a macroinstruction is retiring (except if the microword specifies DISABLE JIETIRE'YES). 
See Section 8.5.15.9 for more detail. 

When a microword specifying SEQ .AfUX/LAST. CYCLE. OVERFLOW is advanced into S5, and the PSL<TV> 
and PSL<v> bits are both set, the Ebox signals the Microsequencer that an integer overflow 
microtrap should occur. 

8.5.14.4 Miscellaneous Ebox-to-Microsequencer Signals 

The Ebox sends the Microsequencer several PSL bits which affect new microflow dispatching 
(dispatching in response to SEQ.MUX/LAST.CYCLE or SEQJrfIJX/IJVST.cryCLE.OVERPLOW). They are 
PSL<T, TP, and FPD>. When the Microsequencer next decodes a SEQ.MUX/LAST. CYCLE or 
SEQ.MUX/U^T.CYCI£.OVERFLOW operation, if PSL<FPD> or PSL<TP> is set, it dispatches to special 
microflows (a different microflow for FPD than for TP) instead of the next macroinstruction 
execution microflow. If it dispatches for FPD (first part done), the Microsequencer removes an 
entry from the instruction queue and sends the instruction context information to the Ebox. For 
TP (trace fault) dispatches, the instruction queue is not referenced and the instruction context 
register is not loaded. 

When PSL<T> is set at instruction dispatch time (including dispatching for FPD), the 
Microsequencer sets a local copy of the PSL<TP> bit, called LOCAL_TP (see Section 9.2.3.3.2). If 
LOCAL.TP or PSL<TP> is set at the time of a dispatch for a macroinstruction, the instruction queue 
reference does not occur and a trace fault dispatch occurs instead. This could happen on the 
very next cycle after the macroinstruction dispatch with PSL<T> set and PSL<TP> not set. The 
Microsequencer sets LOCAL_TP during the first dispatch cycle so that it can affect the immediately 
subsequent dispatch. 
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The Ebox asserts the signal E_PSL%PSL_IS_DST_S5_H in S5 of any cycle in which the entire PSL 
is being updated (i.e., if only the low byte of the PSL is updated, E_PSL%PSL_IS_DST_S5_H is not 
asserted). The Microsequencer clears LOCAL.TP when this signal is asserted. Note that the 
Microsequencer will initiate a trace fault dispatch if the PSL<TP> bit is set or LOCAL_TP, or both. 
So if a new PSL with PSL<TP> set is loaded, the trace fault dispatch will occur at the correct point. 

NOTE 

There is a microcode restriction which disallows specifying SEQ.MUX/LAST.CYCLE or 
SEQ.MIJX/LAST.CYC]^.OVERFLOW in the two microwords following one which loads the 
PSL. An exception to this rule is made when none of the PSL bits which affect new 
microflow dispatching will be changed. Some microflows know from context that none 
of these bits will change in a given PSL write (for example, in the execution microflow 
for the CALL macroinstruction, several bits in the low byte of PSL are cleared, but 
<T, TP, and FPD> are unaffected). 

8.5.1 5 Ebox-lbox Interface 

The Ibox to Ebox interface is made up of a number of FIFO queues which carry operand information 
to the Ebox. These are the source queue, destination queue, field queue, and branch queue, 
which carry source operand information, destination operand information, type information for 
bit field operands, and branch related information, respectively These queues are part of the 
Ebox. The Ibox generally processes instructions ahead of the Ebox. As it processes operand 
specifiers it adds entries to one or more of the queues. Each specific macroinstruction execution 
microflow always removes the same number of entries from each queue as the Ibox adds (unless 
an exception occurs). With this buffering, the Ibox and Ebox operate independently enough that 
stalls or latencies in one box don't necessarily cause a stall in the other, resulting in greater 
overall execution speed. 

See Chapter 7 for more detail on many of the topics in this section. 

The Ebox maintains macroinstruction ordering information in the retire queue. This FIFO is not 
part of the Ibox to Ebox interface, but is closely related. The Ebox is both the supplier and the 
consumer of retire queue entries. 

In any of the queues described in this section an entry which hasn't been added is said to be 
invalid. Except in the case of the field queue, a stall (S3 for source queue, S4 for destination 
queue and branch queue) results when the microword references a queue entry which isn't valid. 
This stall ends when the Ibox adds enough entries to fulfill the microword's request. 

In any of the queues described here, adding an entry means writing an entry, and moving the 
write pointer to the next entry in the queue. Accessing or referencing an entry means reading 
an entry, and moving the read pointer to the next entry in the queue. Where it is needed, status 
information concerning the number of valid entries in a queue is generated by examining the 
read and write pointers of that queue. 
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8.5.15.1 Ibox Counters 

The Ibox has three counters which prevent queue overrun. Two counters are used to keep track 
of the number of entries in the source and destination queues, one for the source queue (allowing 
12 entries) and one for the destination queue (allowing 6 entries). The Ibox increments these 
counters when it adds entries. The Ebox notifies the Ibox when it retires entries from the source 
or destination queue, and the Ibox decrements the counters in response. 

Another counter in the Ibox keeps track of the number of macroinstructions which have been 
sent to the Ebox but have not been retired. This limits the number of entries in the retire queue, 
branch queue and field queue because there can be no more than one entry in each of these queues 
for any given macroinstruction. The counter allows up to 6 instructions in the Ebox/Fbox at a 
time. The Ibox increments this counter when it adds an entry to the instruction queue. When 
the Ebox signals the Ibox that a macroinstruction is retiring, the Ibox decrements the counter. 
This happens in S5 of the Ebox pipeline, one or two stages after the stage in which entries are 
removed from these queues. Note that this same mechanism limits the number of instruction 
queue entries to 6. 

NOTE 

The limit of one field queue entry per macroinstruction is simply an NVAX convention. 
The VAX Architecture does not include instructions which have more than one bit field 
base address operand specifier, but NVAX defines other operands as field type where it 
simplifies the implementation. 

The Ibox also has a counter to keep track of the number of available MD registers. It increments 
this counter when it allocates an MD to hold operand data (e.g., when it initiates a read of operand 
data from memory to an MD). When the Ebox retires a source queue entry, it tells the Ibox whether 
the entry pointed to an MD. The Ibox decrements the counter when the Ebox retires a source queue 
entry which pointed to an MD. It is possible for the Ebox to retire two source queue entries in 
one cycle, and the Ibox decrements the counter by two when both source queue entries pointed 
to MDs. 

8.5.15.2 Source Queue 

The source queue carries source operand information. The information is either literal mode data 
(6 bits) or a pointer into the register file. If it is a register file pointer, it either points to a GPR or to 
an MD register. The Ebox accesses one or two source queue entries per cycle in S3. Source queue 
accesses always cause data to be sourced to E_BUS%ABUS_L<3 1 :0> or E_BUS%BBUSJL<31:0>. literal 
mode data is zero extended and driven directly onto the specified bus. Otherwise the contents 
of the location in the register file pointed to by the source queue entry is fetched. If the register 
which is accessed is not valid or is marked for writing by the Fbox, then the appropriate S3 stall 
occurs. 

Figure 8-6 shows a source queue entry. The VALUE field is either a register file address or a 6-bit 
literal data value. If it is a register file address, it points to either a GPR or MD register. SHJJT 
indicates whether VALUE is short literal data (if SH_LIT is 1, VALUE is short literal data). 

Source queue entries are made for read, modify, address, and field operands. Both a source queue 
and a destination queue entry is made for each modify operand. 



DIGITAL CONFIDENTIAL 



The Ebox 8-43 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 8-6: A Source Queue Entry 



06 05 04 103 02 01 00 
>.—+__+ — +—+—+—.+--+ 
I | VALUE | 



+ SK LIT 



Field operands in NVAX are classified into read and modify types. Read and modify field operands 
both result in a source queue entry. Modify field operands also result in a destination queue entry 
if the operand specifier is register mode. 

Two source queue entries are made for quadword length operands. If they are for registers, they 
point to registers N and N+l. If they are memory operands, they point to MD registers which 
will receive data from memory addresses A and A+4. For literal mode, the first value is the 
immediate data, and the second is 0. 

Source queue access fulfills a necessary synchronization function. When microcode successfully 
accesses a source queue entry it knows that the Ibox was able to fetch the associated operand 
specifier. It also knows that there is no access violation or invalid translation condition associated 
with the operand. For modify type operands it also knows that the location will not give an access 
violation when written. Microcode for complex macroinstructions always references all source 
operands which might cause a memory management fault before altering any architecturally 
visible state. 

The number of entries in the source queue is 12. 

8.5.15.3 Destination Queue 

The destination queue carries destination operand information. The information is either an 
address in the register file of a GPR or a status indicating a memory write to the address in the 
PA queue in the Mbox. The destination queue is accessed in S4 (no more than one entry per cycle 
is used). Its information is used to decide how to write the result which is being calculated by 
the ALU, shifter, or Fbox in the same cycle. If the destination queue entry indicates a memory 
store, the request is sent to the Mbox. An S4 stall occurs if the Mbox is already busy or the PA 
queue entry is not ready. If the destination queue entry indicates a GPR write, the register file 
will be written using the address from the destination queue. The GPR write occurs in the next 
cycle (S5). 

Figure 8—7 shows a destination queue entry. The VALUE field is either a register file address or is 
unused. If it is a register file address, it points to a GPR. MDEST indicates whether the destination 
of the data is memory. If MDEST is 0, the result is destined for the register file and VALUE field 
indicates the destination address. If MDEST is 1, the destination of the data is memory and the 
VALUE field is unused. 

Destination queue entries are made for modify and write access type operands. Also, modify field 
operands result in a destination queue entry if the operand specifier is register mode. 
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Figure 8-7: A Destination Queue Entry 
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Two destination queue entries are made for quadword length operands. If they are for registers, 
they point to registers N and N+l. For memory operands they point to addresses A and A+4. 

Destination queue access fulfills a necessary synchronization function. If the destination queue 
entry is accessed and used successfully, microcode knows that the destination operand specifier 
was fetched successfully and that there will be no access violation when the destination location 
(if it is in memory) is written. In the case of quadword data length, successful use of the first 
destination queue entry guarantees that the second write will not incur a memory management 
exception either. 

The destination queue contains the Fbox destination scoreboard function. See Section 8.5.16.4 
for more information. 

The number of entries in the destination queue is 6. 

8.5.15.4 Misceiianeous Queue Retire information 

When an entry is retired from the source or destination queues, certain information is sent back 
to the Ibox. The Ibox uses this information to maintain three counter values and to maintain 
GPR scoreboard information in the scoreboard unit (SBU). 

Zero or one destination queue entry can be retired in a given cycle. The retire information sent 
to the Ibox for the destination queue is: 

• whether an entry is being retired, 

• whether the entry being retired indicates a GPR write or a memory write, and 

• the GPR number if it is a GPR write. 

The Ebox signals the Ibox when a destination queue retire occurs early in the cycle in which the 
operation is advanced into S5. 

Zero, one, or two source queue entries can be retired in a given cycle. Similar information is 
sent for each of the two source queue read ports. The retire information sent to the Ibox for each 
source queue read port is: 

• whether an entry is being retired, 

• whether the entry being retired indicates a GPR read, an MD read, or is short literal data, and 

• the GPR number if it is a GPR read. 

The Ebox signals the Ibox when one or two source queue retires occur. It does this early in the 
cycle in which the microword retiring the source queue entries is advanced into S4. 
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8.5.15.5 Branch Queue 

The branch queue carries information for conditional and unconditional branches. The 
information is a one-bit prediction status. The prediction status is only used by conditional 
branches. It indicates which way the Ibox predicted the conditional branch would go. The Ebox 
references the branch queue for two reasons: to synchronize with the Ibox fetch of the branch 
displacement and to compare the Ibox branch prediction to the actual branch result. 

The Ebox accesses the branch queue in S4 when the microword specifies SYNC.BDISP, 
SYNC.BDISRRETERE, or SYNC.BDISP.TEST.PRED in the MRQ field. SYNC .BDISPJRETTRE is used in 
unconditional branches. SYNC.BDISP.TESTPRED is used in all conditional branches. SYNC3DISP 
is used in some complex conditional branches. The Ibox doesn't add an entry to the branch queue 
until it has successfully fetched the displacement. When the Ebox accesses the branch queue, it 
will stall until there is an entry. This stall occurs in S4 and prevents the branch macroinstruction 
from retiring before the displacement has been successfully fetched. 

For conditional branches, the Ebox waits for the Ibox to add the entry to the branch queue 
and then compares the Ibox prediction to the actual result of the branch which is calculated in 
the Ebox. If the branch was mispredicted, the Ebox initiates a microtrap in S5. Because the 
microtrap is in S5, the branch macroinstruction retires but subsequent macroins tractions are 
prevented from completing. 

In some complex conditional branches, the Ebox microcode waits for the branch queue entry to 
become valid before it stores a result calculated by the instruction. This allows the microcode to 
be sure the branch displacement was fetched without a memory management fault or hardware 
error before modifying state. The microcode may have to delay retiring the branch queue entry 
and checking the branch prediction. So SYNC.BDISP accesses the branch queue, and causes an S4 
stall if the entry is not valid, but does not cause the entry to be retired. 

The Ebox signals the Ibox whenever a microword which retires a conditional branch queue entry 
advances into S5 (that is a microword specifying SYNC3DISP.TEST.PRED). This causes the Ibox to 
release the alternate branch path PC (the PC of the path not taken by the Ibox prediction). The 
Ebox signals a mispredicted branch at the same time, if there is one. If there is a mispredicted 
branch, the Ibox responds by unwinding the RLOG and resuming macroinstruction fetching at the 
alternate PC address. 

Due to complexity in the branch queue bypass logic, it may happen that one cycle of "unnecessary" 
stall occurs in cases where there back-to-back branches are executed. The extra cycle of stall 
happens only if the two branches are in adjacent stages of the Ebox pipeline and the Ibox writes 
the second branch queue entry one cycle before the the second branch is in S4, ready to retire 
(i.e., it wouldn't be stalled except for the branch queue stall). In this case the branch queue 
read pointer is being advanced and another branch queue entry is being written. Bypass is not 
implemented for the second branch in this specific case. 

The number of entries in the branch queue is 6. 

8.5.15.6 Operand and Branch Buses 

The transmission of operand information for the source queue, destination queue, and field queue 
occurs via the operand bus. This bus is described in Chapter 7. It carries all the information 
which might be entered into any of these queues, and it has valid bits which tell the Ebox when 
to add entries. 
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The operand bus caries information derived from decoding a single operand specifier. Zero, one 
or two source and/or destination queue entries are specified, and zero or one field queue entry. 
Only when the operand is quadword length can more than one source or destination queue entry 
be made. Whether a source or destination queue entry is made depends on whether the operand 
is read, write, or modify access type. (Note that the access type referred to here might not be 
identical to the true access type given in the VAX Architecture Standard, for various reasons.) 

A field queue entry is made for each field operand. The Ibox instruction decode logic determines 
if a particular operand is a field operand. Only certain macroinstructions have a field operand, 
and no macroinstruction has more than one field operand. 

The branch queue receives its information via the branch bus. This bus has one bit of data (a 
prediction status) and a valid bit. A branch queue entry containing the prediction status is added 
in every cycle in which the valid bit is asserted. See Chapter 7 for more information. 

8.5.15.7 Retire Queue 

The retire queue is used by the Ebox to force macroinstructions to retire in order. It contains 
one bit of information, a status indicating whether the Ebox or Fbox is the source of the next 
macroinstruction to retire. The Ebox adds an entry to the retire queue in S3 each time a new 
macroinstruction execution microflow begins. (If there is an S3 stall, the entry is added to the 
retire queue in the first cycle of the stall. Exactly one entry is made whether or not an S3 stall 
occurs for one or more cycles.) The retire queue entry is the PI bit from the instruction context 
register (see Section 8.5.14.1). However, if the FBOXJENABLE bit in ECR (DPR 125) is not set, the 
retire queue entry is forced to indicate Ebox retire regardless of the FI bit. Similarly, if PSL<FPD>, 
PSL<27>, is set, the retire queue entry is forced to indicate Ebox retire regardless of the FI bit. 

The the retire queue is forced to indicate that the Ebox is next to retire when ECR<FBOXJENABLE> 
is not set because the Fbox will not receive an operation dispatch from the Ebox 
(F%FBOX_lST_CYCLE_H will never be asserted). ECR<FBOX_ENABLE> also disables microcode 
sending of operand data, overriding microcode. The Ebox generally forces a reserved instruction 
microtrap when Fbox instructions are in S4 (see Section 8.5.16.8 for more detail). This microtrap 
flushes the retire queue (and, because the retire queue is empty, the Ebox is automatically selected 
as the RMUX source). 

If the Fbox instruction is MULL, a reserved instruction microtrap does not occur (see 
Section 8.5.16.8). Instead the Ebox microcode executes the MULL. This requires that the Ebox be 
selected as next to retire and is the reason ECR<FBOX_ENABLE> forces the retire queue entry to 
select the Ebox. 

When PSL<FPD> is set SEQ.MUX/LAST.CYCLE and SEQ.MUX^LAST.CYCLE.OVERFLOW causes the 
microsequencer to dispatch to a specific microcode entry point regardless of the instruction queue 
contents. Since this dispatch is to an Ebox microcode flow which will not send operands to the 
Fbox, the Ebox must be selected in the retire queue (though any previous instruction is not 
affected and retires normally). Otherwise, the Ebox could stall waiting for the Fbox to retire an 
instruction while the Fbox waited for source operands to be sent. That deadlock would only end 
on S3 stall timeout. 

The Ebox examines (without retiring an entry) the retire queue in S4 to determine whether the 
Fbox or the Ebox is the next source of a retiring macroinstruction. Based on the retire queue 
output, the RMUX is set to select either the Fbox or the Ebox as the source of control for S4-initiated 
memory references and most S5 operations. This selection remains in effect until the retire queue 
entry is retired. See Section 8.5.5 for more on how this status is used to control the RMUX. 
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If the Ebox is the next to retire a macroinstruction, the retire queue entry is retired in S4 when 
the microword advancing into S5 specified SEQ.MUX/LAST.CYCLE or SEQ.MUX/LAST.CYCLE.OVERFLOW 
and did not specify DISABLE.EETERE/YES . If the Fbox is the next to retire a macroinstruction, the 
retire queue entry is retired in S4 when the Fbox asserts F%RETTRE_H. In either case the retire 
queue entry is not retired unless the selected operation advances into S5 (i.e., there is no S4 
stall). (Note that a retire queue entry is not retired by the MISCl/RETIRE.INSTRUCTION operation.) 

The retire queue is flushed when a microtrap occurs as well as when the MISC field function 
RESET.CPU is specified. Anytime the retire queue is empty, the Ebox is automatically selected as 
the source of the RMUX. 

Note that it is not possible for the retire queue to have less than the necessary number of entries 
in it, except after a microtrap, because each entry is added before it is required. 

The number of entries in the retire queue is 6. 
8.5.15.8 Field Queue 

The field queue carries information about field type source operands for bit-field macroinstructions 
and some other macroinstructions. The information is one bit which indicates whether the 
operand was register mode or not. Two different execution microflows are required for bit-field 
macroinstructions and certain other macroinstructions depending on whether a particular 
operand is register mode. The Ibox provides this information when it adds a source queue entry 
for the operand. Microcode is able to branch conditionally on the status of the field queue. This 
allows execution microflows to decide how to execute the instruction. 

Each entry in the field queue is a one-bit status which indicates whether the associated field 
operand is register mode. Microcode branches on a field queue entry are four way branches, 
though only three of the four outcomes are possible. The following table shows the possible 
branch outcomes. 



Table 8-11: Field Queue Branch 



Condition 




Resulting Microtest Bus Value 


Field queue empty 
Field queue not empty- 
Field queue not empty- 


-register mode 
-not register mode 


11 (can be execution dispatch target) 

01 (start of execution for register mode case) 

00 (start of execution for address mode case) 



A branch on the field queue when it is not empty causes the current field queue entry to be 
retired. 

The field queue has 6 entries. 

When the Ebox is branching on the field queue, it may have to wait for the Ibox to make an 
entry, in which case it loops repeatedly testing the field queue. This condition is similar to a 
stall, but no Ebox stall is involved. When microcode is branching on the field queue and it is 
empty, the signal E_FLQ%FQ_STALL_H is asserted. This tells the S3 stall timeout logic that the 
Ebox is looping on the field queue. If this continues for a long time, a machine check occurs. See 
Section 8.5.25.1 for more detail. 
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E_FLQ%FQ_STALL_H is also used by the fault logic. If E_FLQ%FQ_STALL_H is asserted and one 
of I%1MEM_MEXC_H, I%IMEM_HERR_H, or I%RSVD_ADDR_FAULT_H is asserted, then a S3 fault 
condition is detected. After a cycle in which there is no S4 stall (and given that the Ebox is 
next to retire), the fault condition advances into S4 and the appropriate microtrap is requested. 
See Table 8-12 and Section 8.5.19 for more information. 

8.5.15.9 Retiring instructions 

Retiring a macroinstruction is an important synchronization point between the Ebox and the 
Ibox. When a macroinstruction is retiring, the last of its operations is in S5 and cannot be stalled 
or aborted. The Ebox signals the Ibox so that it can free up certain resources associated with the 
retiring instruction. The Ebox usually retires a retire queue entry at the same time as it retires 
the macroinstruction (the exception is MISC 1/RETIRE.INSTRUCTTON which doesn't affect the retire 
queue). 

The resources in the Ibox which are freed up by retiring a macroinstruction are a backup PC 
queue entry and a group of RLOG entries associated with that macroinstruction. 

When the retire queue indicates the Ebox is next to retire a macroinstruction, the set of conditions 
required for retiring to occur are: 

• the microword in S5 specifies SEQ.MUX/LAST.CYCLE or SEQ.MUX/LAST.CYCLE.OVERFLOW, and not 
DISABLE. RE TIKE/YES, or 

• the MISCi field function, MSC1/RETIRE.INSTRUCTION, is specified (though the retire queue is 
not affected in this case). 

The Fbox determines its own retire instruction status which it sends through the RMUX when the 
retire queue indicates the Fbox is next to retire a macroinstruction. If the Fbox operation request 
in S4 is advanced to S5 with this condition asserted, the Ebox retires an instruction. 

8.5.15.10 First Part Done 

The Ebox sends the current state of the PSL<FPD> bit to the Ibox on E%FPD_SET_L. If the Ibox 
fetches an opcode and this bit is set, the Ibox stops operation as soon as the opcode has been 
completely fetched. If the instruction is an interrupted instruction that is being resumed, then 
the operand specifiers mustn't be processed again since they may have side effects or may depend 
on data which has been altered by the instruction's execution. 

8.5.15.11 Ebox to ibox Commands and IPR Accesses 

The Ebox is the source of two signals which immediately affect Ibox operation, and three others 
which cause IPR read and write operations or a load-PC operation. 

The two signals which immediately change Ibox operation are: E%STOP_ESOX_H and 
E%RESTART_EBOX_H. E%STOPJDBOXJB is asserted in S5 when the microword specifies 
MISC/RESET.CPU. E%RESTART_EBOX_H is asserted when the microword in S5 specifies 
MISC/RESTART.IBOX. 

E%STOP_BBOX i .H is used to cause the Ibox to stop processing instructions and clear the Ibox GPR 
scoreboard. It does not clear the RLOG or backup PC queue, so the Ibox is still able to restore 
state to that required for a fault. See Chapter 7 and Section 8.5.19. 
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E%RESTART_IBOX_H is used to restart the Ibox when it put itself in the stopped state after 
processing the operands for certain complex instructions. 

The Ebox detects its own accesses to Ibox EPRs in S5 just after issuing the request to the Mbox. 
It also decodes MRQ/LOAD.PC to detect a load-PC operation in S5. At that time it asserts one 
of three command strobes to the Ibox. They are E%EBOXJDPR_READ_H, E%IBOX_IPR_WRITE_H, 
and E%EBOX_LOAD_PC_L. The Ebox drives the signal fields E%IBOX_IPR_TAG_H<2.o> and 
E%IBOX_IPR_NU»t_H<3:0> with the Wn register file destination for IPR read data and the IPR number, 
respectively. (The full register file address for the destination is 6 bits, but the Ibox appends the 
prefix for Wn registers since all Ibox IPR reads are sent to Wn registers.) For IPR writes and 
load-PC operations the Ibox receives the data when the Mbox forwards it on M%MD_BUS_H<31:0> 
in a later cycle. For read accesses the Ibox returns the data to the designated Wn register. 

Microcode synchronizes load-PC operations by issuing an 

Mbox operation (possibly MRQ/SYNC.MBOX). This synchronization is necessary because the Ibox 
will not be ready to accept the new PC data if a MISC/RESET.CPU occurs before the new PC data is 
forwarded by the Mbox. Any interrupt or exception which occurs after the load-PC will cause the 
Ebox to read the backup PC from the Ibox, and that value must have resulted from the load-PC 
operation. Once the synchronizing Mbox operation is complete, the microcode knows the Ibox has 
the data. 

Ibox IPR writes are synchronized by issuing a MRQ/SYNC .MBOX (or another Mbox operation) after 
the operation. Once the MRQ/SYNCJMBOX (or other Mbox operation) is complete, the microcode 
knows the Ibox has the data. 

8.5.1 5.1 2 Loading The PC 

The Ibox maintains all PC information for the NVAX CPU. When microcode executing in the Ebox 
determines that instruction fetching should begin at some address, it sends the starting PC value 
to the Ibox. Conceptually, this is equivalent to loading the PC register. However, the Ibox keeps 
track of a number of PC values, and there isn't really a current PC register. See Chapter 7 for 
more on how PC values are maintained. 

The Ebox sends a new PC value to the Ibox in S5 when the microword specifies LOAD.PC in the 
MRQ field. The PC data is sent via the Mbox. Microcode first ensures that the Ibox is stopped 
and, if necessary, flushes appropriate queues. Note that the RLOG should have been unwound 
beforehand. 

8.5.15.13 Ebox to Ibox Flush Signals 

Microcode is able to flush several entities in the Ibox: the virtual instruction cache (VIC), 
the branch prediction cache (BPC), and the backup PC queue (PCQ). In S5, the Ebox drives 
E%FLUSH_VIC_H, E%FLUSH_BPT_H, and E%FLUSH_PCQ_H, when it decodes MISC 1/FLUSH. VIC , 
MISC1/FLUSHJBPC, and MISC 1/FLUSH.PCQ, respectively. 
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8.5.15.14 Detecting Ibox Incurred Faults and Errors 

There are two kinds of faults which can occur due to Ibox processing. Also, hardware errors can 
occur. When a fault or error occurs, the status is latched. The Ebox effectively detects the fault 
or error when it executes a microword which uses the result of the operation which incurred the 
fault or error. The Ebox causes a microtrap to occur when that same microword is about to be 
advanced from S4 into S5. See Section 8.5.19 for more on microtrap management. 

Some Ibox incurred faults and errors are initially detected by the Ibox, while others are first 
detected by the Ebox. When the Ibox detects a fault or error, it halts operation and asserts one of 
two fault indication signals or an error indication signal which are all received by the Ebox. These 
signals are I%IMEM_MEXC_H (which indicates a memory management fault), I%JMEM_HERR_H 
(which indicates a hardware error), and I%RSVD^ADDR_PAULT_H (which indicates a reserved 
addressing mode). The Ibox only asserts I%RSVD_ADDR_PAULT_H for one cycle, so the Ebox has 
a latch which is set when it is asserted. This latch is reset by MISC/RESET.CPU and by branch 
mispredict microtraps. 

The Ebox ignores Ibox fault conditions until it determines that they applies to the current 
microword. This is done by associating some queue empty condition with the fault status. See 
Table 8-12. 

Faults and errors not detected by the Ibox are reported by the Mbox. For reads, the Mbox sets 
the fault or error bit associated with the target MD register in the register file. For writes, it 
sets the fault or error bit in the appropriate PA queue entry. When the Ebox references the MD 
register or tries to use the PA queue entry with a fault bit set, it detects the fault. 

Faults in memory reads issued by the Ibox as an intermediate step in processing an operand 
specifier (as in register deferred mode) are handled in a special way. When the memory read 
fault or error is detected in the Mbox, it returns a fault/error status instead of data. The Ibox 
latches this fault/error status. If the Ibox was going to use this data as an address (deferred 
mode), it sends the fault/error status with the next specifier related memory request. The Mbox, 
seeing the fault/error status associated with the operation, sends the result to the MD register 
(for reads) or PA queue (for writes) with the same fault/error status. 

Detecting faults in memory reads issued by the Ibox as an intermediate step in processing an 
operand specifier can also occur another way. In the case where the Ibox will not have to issue a 
memory request using the result of the failed request (as in address access type with a deferred 
mode operand specifier), the Ibox reports the error by writing the MD fault or error status bit 
directly. The fault/error status latched in the Ibox is written into the MD fault/error status bits 
when the Ibox writes the MD. 

The table below lists the faults and indicates how each is detected. 



Table 8-12: Detection of Ibox Incurred Faults and Errors 

Fault How Detected 



Instruction stream read fault/error on opcode Instruction queue empty AND (i%imkmmkxc_h OR 

I*IME»*_HEHK_H) 
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Table 8-12 (Cont.): Detection of I box Incurred Faults and Errors 



Fault 



How Detected 



Instruction stream read fault/error on source 
operand (including modify type) 

Instruction stream read fault/error on destination 
operand (write type) 

Instruction stream read fault/error on branch 
displacement 

Memory access fault/error 

encountered in processing a source operand 
(including modify type) 

Memory access 
fault/error encountered in processing a destination 
operand (write type) 

Reserved addressing mode on source operand 

Reserved addressing mode on destination operand 
Reserved opcode 



(Source queue empty 1 OR KjBUQ%FQ_STALLja asserted) AND 
(l%mEMJMDSXC_H OR i%imem l _hkrr_h) 

Destination queue empty AND (i%IMEMJMEXC_h OR 

I%IMEMjaERR_H) 

Branch queue empty AND (i%iMEMjttKxc_H OR i%imehjecers_h) 
Attempt to read an MD register with a fault bit set 

Attempt to use a PA queue entry with a fault bit set 



(Source queue empty 1 OR e_flq%fq_staix_h asserted) AND 

I%RSVD_ADDR_FAULT_H 

Destination queue empty AND i%rsvd_addr^fault_h 
Microsequencer Dispatch 



1 In this context, source queue empty includes the case where the microword in S3 requires two source queue entries to 
advance, but only one entry is present in the source queue. 



It is not possible for the Ibox to assert both I%RSVDJVDDE_FAULT_H and either of I%IMEMJVIEXC_H 
or I%IMEM_HERR_H at the same time. The Ibox stops operation as soon as it encounters one of 
these two faults, so the other cannot occur after one is detected. 

8.5.16 Ebox-Fbox Interface 

The Fbox executes independently of the Ebox but is dependent on the Ebox for delivery of source 
operands and storing of results. Floating point macroinstructions are decoded by the Ibox exactly 
like any other macroinstruction. The Ebox is dispatched to an execution microfiow. This microflow 
delivers the source operands to the Fbox in S3 of the pipeline. Once the operands are delivered, 
the microflow is done. The Fbox returns the result in S4, along with any faults it might have 
detected. The Ebox keeps track of whether the Fbox macroinstruction is next to retire using the 
retire queue (see Section 8.5.15.9 and Section 8.5.15.7). Once the Fbox is next to retire, the Ebox 
may, at the Fbox's request, access the destination queue for the Fbox to determine where the 
Fbox results are to be written. When the Fbox indicates its last execution cycle, the Ebox retires 
a retire queue entry and updates the PSL with an Fbox supplied condition code. 



8.5.16.1 Fbox Opcode and Operand Delivery 

The Ebox prepares to deliver operands during S3 when the microword specifies FOP.VALID in the 
MISCl field. The opcode<8:0> for the instruction is delivered from the Microsequencer late in S2, 
so that the Fbox can decode the opcode before the operands arrive. The operands are available 
at the beginning of S4. They come from the output of the bypass muxes so that result data from 
the most recent S4 (Ebox or Fbox) operation is bypassed if necessary. Anything which stalls S3 
in the Ebox, stalls Fbox operand delivery (this includes S4 stalls). Along with the operands, the 
Ebox sends the current value of PSL<FU>. 
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If the Ebox detects a fault or error associated with an Fbox source operand, it indicates this to 
the Fbox. The Fbox carries this information along its pipeline and indicates the fault and/or error 
when the Ebox is retiring the Fbox operation. This is how Fbox source operand fault microtraps 
are delayed until all preceding macroinstructions have retired. The Ebox ignores source operand 
faults (which proceed down the pipeline to S4) when the Fbox is next to retire. 

8.5.16.2 Fbox Result Handling 

The Ebox handles writing of Fbox results in S4 and S5. When the current retire queue entry 
indicates the next macroinstruction to retire is to come from the Fbox, the Ebox waits for the Fbox 
to assert F%STORE_H or F%RETTRE_H. Either or both may be asserted. If F%STORE_H is asserted, 
the Ebox accesses the destination queue and issues a memory store or a GPR write, depending on 
the MDEST bit in the current destination queue entry. (See Section 8.5.17 for the exact definition 
of memory store.) 

The Fbox indicates it is retiring an instruction by asserting the signal F%RETIRE_H. In response 
to this signal, the Ebox retires the current retire queue entry. The Fbox sends a map specifier 
which tells the PSL logic in S5 of the Ebox pipeline how to set the PSL condition code bits based 
on the Fbox condition code. There may be an Fbox result store at the same time as a retire. 

The storing of Fbox results is handled exactly like the storing of Ebox results in the pipeline. The 
request is made in S4, through the RMUX. The Fbox supplies the data length for the store. (It 
derives the data length from the opcode.) If there is no stall or fault, the operation is advanced 
into S5 where the write is done unconditionally. Condition code updates are done in S5, too. The 
stalls which apply to this operation are the same as for an Ebox microword doing a stall. The 
destination queue and PA queue must have valid entries and the Mbox must be ready, if the 
Fbox is doing a store. The retire queue must indicate the Fbox for an Fbox store or retire to be 
allowed. Otherwise the Fbox store or retire is stalled. 

8.5.16.3 Fbox Store Stall 

In some cases the Fbox asserts P%STORE_H to indicate it has result data to store and then asserts 
P%STORE_STALL_H to abort the store. This is done because certain Fbox operations may take an 
extra cycle, depending on the actual data pattern. P%STORE_STALLi_H is asserted too late for the 
Ebox to not send a store request to the Mbox (if the result is supposed to be stored to memory). 
If a store is forwarded to the Mbox and is then revoked by F%STORE_STALL_H, the Ebox asserts 
E%EM _ABORT_L early in the next cycle to abort the EM_LATCH operation and purge the EMJLATCH. 
This is the same mechanism used to abort EM_LATCH operations when an Ebox pipeline abort 
occurs (see Section 8.5.17.2). 

Due to complexities in the Mbox, (see Section 8.5.17.2), the Ebox ignores M%PA_Q_STATUS_H<o> 
in cycles in which E%EM _ABORT_L is asserted because of previous P%STORE_STALL_H assertion. 
In this cycle, it behaves as if M%PA_Q_STATUS_H<0> is deasserted. 

Ignoring M%PAJl_STATlJS_H<0> and behaving as if it is deasserted has the effect of unconditionally 
stalling the Fbox store (which is always ready in these cases in the current implementation). This 
means there is one cycle additional latency beyond that introduced by the Fbox aborting the store. 
Note this only occurs when E%EM_ABORT_L is actually asserted. If the abort store never was sent 
to the Mbox, M%PA_Q_STATUS_H<0> is not ignored. 
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8.5.16.4 Fbox Destination Scoreboard 

The Ebox maintains state to detect pending Fbox stores to GPRs in the Fbox destination 
scoreboard. If any Ebox or Fbox operation attempts to source one of the GPRs which the Fbox is 
scheduled to update, the Ebox stalls and Fbox operand delivery is stalled. The Fbox destination 
scoreboard is implemented as part of the destination queue. This section describes the Fbox 
destination scoreboard functionality of the destination queue. See Section 8.5.15.3 for more on 
the main function of the destination queue. 

The Fbox destination scoreboard consists of a pair of comparators and a write-pending bit 
associated with each destination queue entry. If an Fbox update of a particular GPR is pending, 
the write-pending bit in the destination queue entry for that store is set. The bit is set in 
S4, by specifying F.DEST.CHECK in the MISC2 field. If the Fbox source operands are all sent 
by one microword, that microword specifies MISC2/P.DEST.CHECK. If a sequence of more than 
one microwords sends the source operands to the Fbox, the MSC2/F.DEST.CHECK is in the last 
microword. 

Whenever a GPR is accessed using the source queue (A/Sl and/or B/S2) in S3 , every destination 
queue entry with a set write-pending bit is compared with the two outputs of the source queue. 
A match, or hit, causes a stall if the source queue output which hits is actually specified by the 
microword in the A or B fields. For a hit to cause a stall, the write-pending bit in the destination 
queue must be set. Additionally, the source queue output which hits must specify a GPR access 
(i.e., it must not point to an MD register or contain literal data). If these conditions are met, the 
S3 operation is stalled. 

Note that the above check includes destination queue entries with their MDEST bit set. So pending 
writes to memory (using PA queue addresses) may cause a scoreboard hit stall. This is not done to 
prevent the Ebox from reading a GPR before a pending Fbox write to the GPR completes. Instead, 
it is done to prevent the Ebox from reading a GPR when the Ibox must write an incremented or 
decremented value first. This occurs when the Ibox processes an autoincrement or autodecrement 
specifier with write access type for an Fbox instruction. In processing the specifier, the Ibox CSU 
can be stalled for some reason, and thus be delayed from writing the new value to the GPR. To 
handle this case, the Ibox sends the GPR number with AT J. destination queue entries. If the 
Ebox reads a GPR which was used in a destination specifier, the scoreboard hit stall prevents the 
read until the destination queue entry is retired. 

Because of the minimum latency in the Mbox in processing specifier accesses, it is known that 
the Ibox CSU will update the GPR before the associated PA queue entry becomes valid, and the 
destination queue entry will not be retired until the PA queue entry becomes valid. (Actually, the 
destination queue entry is effectively retired before the Ebox "knows" that the PA queue entry is 
not valid, but then an S4 stall exists which will last until the PA queue entry becomes valid. This 
stall will also stall S3, so the GPR access will be prevented until the GPR is valid. This is why all 
RMUX S4 stalls also stall S4 and S3 when the Fbox is next to retire an instruction.) 

In the event that a modify access type specifier is processed, an entry is made in the source and 
destination queues for the same specifier. If it is a register mode specifier, it does not cause 
a deadlock because the MISC2/P.DEST.CHECK operation which sets the write pending bit in the 
destination queue for the entry is not done until the last microword of the execution microflow 
is in S4. By that time all the operands have been sent to the Fbox. If the addressing mode is 
some memory access mode, the operand bus bits which carry the GPR number when processing a 
write access type specifier are used instead to carry the index of the MD register which will hold 
the source data. Interpreting this MD index as a GPR number could cause lost performance if a 
subsequent instruction accesses the GPR with the same index as that MD. (Deadlock doesn't occur 
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for the same reason as before.) lb prevent possible loss in performance, the Ebox forces the index 
bits to 1 as they are written into the destination queue GPR field for modify access type operands 
only. This has the effect of converting specifying the PC in the destination queue. If a subsequent 
instruction does access the PC directly, then a stall will not hurt since this is an UNPREDICTABLE 
case. (The Ebox supplies the value 0 when the PC is specified in this way.) 

NOTE 

When the Ebox is next to retire an instruction and is writing to a write access type 
destination operand, it will stall in S4 if the PA queue is not valid. This causes an 

53 stall. Thus the case which motivated the above special scoreboarding case for Fbox 
destinations can not occur for Ebox instructions. In fact, the only reason it can occur for 
Fbox instructions is because there are several "hidden" pipeline stages between S3 and 

54 when the Fbox processes an instruction. These extra pipeline stages allow the Fbox 
to accept new instructions and their associated source operands before it has retired 
the current instruction. This combined with the fact that the Ibox can process "simple" 
specifiers for new instructions even while the CSU is stalled processing a complex write 
access type specifier from a previous instruction is what leads to the need for the special 
scoreboard case described above. 

The Ebox will access ahead of the current destination queue entry as part of the Fbox destination 
scoreboard function. A pointer called the FDest pointer is maintained which may point to an 
entry which is after the front entry in the FIFO queue. Normally, it points to the current entry. 
However, in circumstances where the Fbox is next to store a result, it is incremented ahead of 
the current destination queue entry pointer. 

When the microword in S4 specifies MISC2/F.DEST.CHECK, the Ebox checks that the destination 
queue entry at the FDest pointer is valid. If it isn't, S4 stalls (stalling S3 as well). If the 
destination queue entry is valid, the associated write-pending bit is set. If the DL is quadword, 
then the bit associated with the next destination queue entry is also set. The FDest pointer is 
incremented by one, or by two if the DL is quadword. The write-pending bits are set in S4 even if 
there is an S4 stall. The FDest pointer is incremented as the operation advances into S5, when 
there are no S4 stalls. 

NOTE 

The DL supplied in the instruction queue with Fbox instructions is the length of the 
result. 

Flow-thru bypass ensures that the S3 microword is stalled if it is accessing a GPR and that GPR 
is specified by a destination queue entry whose write-pending bit is being set by the microword 
inS4. 

Write-pending bits in the destination queue are reset in S4 as the Fbox writes results, even if 
the MDEST bit is set in the destination queue entry being retired. Flow-thru bypass ensures that 
an S3 stall due to the scoreboard is broken in the cycle in which the Fbox drives the result to the 
Ebox. This means the result in S4 (after the RMUX) is bypassed to E_BUS%ABUS_L<3 1 .-0> and/or 
EJBUS%BBUS_L<31:0> in these cases. 

In S4, when the Fbox stores a result, the write-pending bit of the destination queue entry is 
reset. This means that destination queue entry can no longer cause a scoreboard hit stall. The 
bit is cleared even if RMUX S4 stalls. In all cases this is safe either because the destination queue 
entry has MDEST set or because the particular RMUX S4 stall also causes an S4 stall which in turn 
causes an S3 stall which prevents Fbox operand delivery. 
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The write-pending bits and all destination queue pointers are reset when E_MSC%FLUSH_EBOX_H 
is asserted. This happens in every microtrap, including the power-up microtrap. 

8.5.16.5 Fbox Fault and Error Management 

As mentioned above, the Fbox latches source operand fault and error information and carries it 
along with its other instruction related information. Also, the Fbox may encounter a fault in the 
course of computing the result. All these faults and errors are presented by the Fbox when it 
requests the RMUX. The Ebox responds by signaling a microtrap to the Microsequencer once the 
retire queue indicates the Fbox. Before the retire queue points to the Fbox, the Ebox ignores the 
fault status coming from the Fbox. 

The Ebox detects Ibox incurred faults and errors for Fbox operands as described in Table 8-12, 
but instead of handling them directly, it passes the fault/error status to the Fbox. The Fbox 
doesn't wait for the operand valid signal when a fault or error status is asserted, even though 
there isn't valid data. This breaks a stall which might never end otherwise, since the Ibox stops 
processing operand specifiers when it encounters a fault or error. 

NOTE 

The Fbox treats the data which comes with the fault/error status as UNPREDICTABLE. 
Also the Fbox breaks the stall on any operands which follow an operand with an 
associated fault or error. The Ibox stops processing operand specifiers when it 
encounters a fault or error. If the Fbox didn't break the stall and propagate the 
fault/error to the RMUX, the CPU would hang. 

If there isn't a fault or error being signaled by the Fbox, there could still be a destination operand 
fault or error. If the Fbox is requesting the RMUX and indicating a destination queue indirect 
store, the Ebox checks for a destination operand fault or error (see Table 8-12). If there is one, 
the appropriate microtrap is forced. 

Most Fbox faults, and all Fbox errors, result in VAX architecture exceptions of the fault type. This 
means most Fbox faults, and all errors, are taken in S4 when the operation is about to advance 
into S5. Integer overflow is a trap in the VAX architecture sense, and causes a microtrap late in 
S5. 

Fbox operand faults and errors have higher priority in the Microsequencer than Fbox originated 
data faults. Fbox operand faults cause the same microtraps as would be taken if that fault or 
error was detected in an Ebox instruction. Fbox originated data faults cause a floating fault 
microtrap, provided there aren't any operand faults or errors. See Section 8.5.19.7 for more on 
how microcode determines the cause of the microtrap. 

8.5.16.6 Ebox to Fbox Commands 

The Ebox asserts the signal E%FLUSH_FBOX_H when the microword in S6 specifies RESET.CPU in 
the MISC field. This has the effect of reseting the Fbox and clearing its pipeline of all operations. 
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8.5.16.7 Summary of Fbox-Ebox Signals 

The following signals are driven by the Ebox to the Fbox. 

• E%FOPCODE_H 

This 9-bit bus carries the full opcode for Fbox operations. (This bus is actually driven by the 
Microsequencer.) 

• E%FBOX^lST_CYCLE_L 

This bit indicates there is a valid Fbox opcode on E%FOPCODE_H. (This signal is actually 
driven by the Microsequencer.) 

• E%ABUS_H and E%BBUS_H 

These 32-bit busses carry the source operand(s). 

• E%FDATA_VALID_H 

This signal tells the Fbox that all operands being sent to it are valid. The Fbox knows, from 
decoding the opcode, exactly what data is being sent on E%ABUS_H<3 1 &> and E%BBUS_H<31:0>. 

• E%AJSHLIT_H and E%B_SHLTT_H 

These signals indicate the data on E%ABUS_H<3lKfc> and E%BBUS_H<31jO>, respectively, is a 
6-bit short literal value extracted from the instruction stream. Special data formatting' is 
required by the Fbox. 

• E%PSL_FU_H 

The current PSL<FU> value for use by the Fbox in deciding whether to signal floating point 
underflow faults or not. 

• E%FJHMGT_FLT_H, E%F_MEM_ERR_H, and E%F_RSVD_ADDR_MODE_H 

These signals tell the Fbox that there is a fault or error associated with the source operands. 
The Fbox carries this status down the pipeline so that it is handled after instructions which 
are already in the Fbox pipeline. 

• E%FLUSH_FBOX_H 

This signal causes the Fbox to clear its pipeline of all operations. 

• E%RETTRE_OK_H This signal tells the Fbox whether to stall if it has an instruction to retire. 
The Fbox stalls if it wants to retire an instruction and this signal is not asserted. 

• E%STORE_OK_H This signal tells the Fbox whether to stall if it has a result to store. The Fbox 
stalls if it wants to write a result and this signal is not asserted, even if it also wants to retire 
an instruction and E%RETTRE_OK u .H is asserted. 

The following signals are driven by the Fbox to the Ebox. 

• F%ENPUT_STALL_H 

This signal causes the Ebox to stall in S3 if it is attempting to send operands to the Fbox. 

• F%STORE_STALL_H 

This signal is asserted by the Fbox when it is asserting F%STORE_H but isn't able to supply 
valid data. 

• F%FBO^_RESULT_H 

This 32-bit bus carries Fbox results to the Ebox. 

• F%CC_N_H, F%CC_Z_H, AND F%CC_V_H 

These are the 3 the Fbox condition code bits. They are Negative, Zero, and Overflow. 

• F%RETIRE_H 

This control signal tells the Ebox the Fbox is retiring an instruction in this cycle. 
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• F%STORE_H 

This control signal tells the Ebox the Fbox is storing a result in this cycle. 

• F%CC_MAP_H<1K>> 

This is the map specifier which tells the Ebox how to update the PSL condition code bits. 

• F%FBOX_DL_H<1:0> 

This is the data length used by the Ebox for an Fbox store. 

• F%MMGT_FAULT_H 

Signals a memory management fault for one of the currently retiring instruction's source 
operands. 

• F%MERH_H 

Signals a memory access hardware error for one of the currently retiring instruction's source 
operands. 

• F%RSVD _ J ADDR_MODE_H 

Signals a reserved address mode fault for one of the currently retiring instruction's source 
operands. 

• F%RSV_H 

Signals a reserved operand fault for one of the currently retiring instruction's source operands. 

• F%POV_H 

Signals a floating point overflow fault resulted from the currently retiring instruction. 

• F%FU_H 

Signals a floating point underflow fault resulted from the currently retiring instruction. 

• F%FDBZ_H 

Signals a floating point divide-by-zero fault resulted from the currently retiring instruction. 

8.5.16.8 Fbox Disabled Mode 

The ability to operate with the Fbox disabled is provided in the Ebox. When the Fbox is disabled, 
all floating point macroinstructions, including all floating point CVT macroinstructions, cause 
reserved instruction faults. MULL is handled in microcode. 

The Fbox enable bit is in IPR 125, ECR (see Section 8.5.22). If it is not set, Ebox hardware functions 
are altered as follows: 

• Assertion of E%FBOX b .lST_CYCLE_L to the Fbox is disabled (in the Microsequencer). 

• The entry made in the retire queue is overridden to specify Ebox instruction retire. 

• A reserved instruction fault is signaled to the Microsequencer when the first microword of 
any Fbox execution microflow is about to advance into S5, except if that microword specifies 
MISC/MULL. 

"With the Fbox disabled, each floating point macroinstruction causes a fault (a VAX architecture 
reserved instruction fault) when the first microword of its execution microflow is about to 
advance into S5. This occurs for all floating point macroinstructions, including floating point 
CVT instructions. 

Microcode can branch conditionally on the Fbox disable bit. The first microword of the MULL 
execution microflow specifies MISC/MULL and branches conditionally on the Fbox disable status. 
If the Fbox is enabled, the branch is to a microflow which dispatches the operation to the Fbox. 
If the Fbox is disabled, the branch is to an Ebox execution microflow which completes the MULL. 
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8.5.17 Ebox-Mbox Interface 

The Ebox to Mbox interface has a memory request function and a returned read result function. 
The Ebox issues memory requests by sending a command, address, and possibly write data to 
the Mbox. The Mbox returns read results by writing them directly into the register file. Faults 
and errors encountered by the Mbox in completing the operation are reported one of three ways 
depending on the operation. 

NOTE 

When the Ebox initiates a memory read by sending a request to the Mbox, it specifies 
the register which will receive the memory data in the DST field of the microword. 
This has the sides effect, when the microword is in S5, of writing that register with 
the value on E_BUS%WBUS_L<3lK>>. Normally this register is written by the Mbox after 
this, before the particular register is read again. However, an exception can prevent 
the Mbox write and leave the register containing effectively garbage data. 

There are three kinds of memory access requests issued by the Ebox: reads, writes, and stores. 
Reads are requests for memory data to be returned to a Wn or GPR register in the register file. 
The Ebox supplies the address directly. Writes are requests that data be written to memory. The 
address and data are both supplied directly by the Ebox. Stores are requests that data be written 
to the address in the current PA queue entry in the Mbox. The Ebox only supplies the data for 
stores. 

There are several control operations the Ebox can request of the Mbox. There are three kinds of 
TB invalidate requests. It can synchronize to the Mbox, causing a stall until the Mbox finishes 
memory management checks for the current request. Also, probe, write check, TB fill, and 
processor register read and write operations are available. 

The Ibox issues operand data reads to MD registers on behalf of the Ebox as it processes operand 
specifiers. The Ebox simply uses the data when it is returned. The Ibox also issues a request 
that is the first half of a store. This supplies an address for the Mbox to translate and then enter 
into the 1^. queue. The Ebox eventually issues a store request which uses the address in the PA 
queue to do the write. 

Memory management faults encountered in memory reads and writes (not stores) issued by the 
Ebox are reported by the Mbox asserting the signal M%MME_TRAP_L which is received by the 
Microsequencer. This causes an immediate microtrap and Ebox pipeline abort. 

Memory management faults encountered in memory reads initiated by the Ibox on behalf of the 
Ebox result in the Mbox asserting M9fcMME_FAULT_H which sets the memory management fault 
status bit associated with the target MD register in the register file. The Ebox detects the fault 
when a microword sources that particular MD register. 

Faults for stores are reported by the Mbox as soon as the PA queue entry is valid. The Ebox 
detects the fault when a microword attempts to issue a store request. 

Hardware errors in memory reads issued by the Ebox are reported by asserting M%HARD_ERR_H 
in the cycle in which read data is written into the register file. The data is generally incorrect, 
since an error occurred. The register file write can't be to an MD register since it is issued directly 
by the Ebox. There aren't fault bits in the register file to receive the error status for registers 
other than the MD registers. So, when the Ebox detects a MD port write to a register other than 
an MD and the error status is asserted, the Ebox forces an immediate microtrap. This microtrap 
is not delayed by any S3 or S4 stalls. 
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Hardware errors in memory reads initiated by the Ibox on behalf of the Ebox result in the Mbox 
asserting M%HARD_ERR_H as it writes the target MD register in the register file. This sets the error 
status bit associated with the target MD register. The Ebox detects the error when a microword 
sources that particular MD register. 

Hardware errors for stores are reported by the Mbox as soon as the PA queue entry is valid. The 
Ebox detects the error when a microword attempts to issue a store request. 

TB parity errors are a special case. Whenever a TB parity error is encountered, the Mbox asserts 
M%TB_PERR_TRAP_L. The Microsequencer initiates an immediate asynchronous hardware error 
microtrap when M%TB_PERR_TRAP_L is asserted. This could happen as a result of Mbox processing 
of any Ebox memory reference, Ibox operand prefetch reference, or Ibox instruction fetch or 
prefetch which uses the TB. 

All Mbox requests except store are specified in the MRQ field of the microword. The store request 
is implicit in Ebox or Fbox result storing through the RMUX. All Mbox requests are issued in S4. 
The table below shows the requests the Ebox can send to the Mbox. See Chapter 12 for more 
detail on each operation. 



Table 8-13: Ebox Mbox Requests 



Request 
Mnemonic 


Addressing 


Access 
Check 


Mode 

IT 1 1 

Used 


Operation 
Description 


MRQ/READ.V.RCHK 


virtual 


read 


current 


read virtual memory 


MRQ/READ.V.WCHK 


virtual 


write 


current 


read virtual memory and check 
for write access 


MRQ/READ.V.NOCHK 


virtual 






read virtual memory with no 
access check 


MRQ/READ.V.LOCK 


virtual 


write 


current 


read-lock virtual memory 


MRQ/READ.P 


physical 






read physical memory 


MRQ/READ.PR 


physical 






read processor register 


MRQ/PROBE.V.RCHK 


virtual 


read 


mode 


Probe byte address for read 
- return 3-bit probe status to 
register file 


MRQ/PROBE.V. 
RCHKNOFILL 


virtual 






Probe byte address for presence 
in TB - return 1-bit status to 
register file, but don't fill TB if 
entry is not already in TB 


MRQ/WCHK 


virtual 


write 


current 


check that memory location can 
be written 


MRQ/WRITE.V.WCHK 


virtual 


write 


current 


write virtual memory 


MRQ/WRITE.V.NOCHK 


virtual 






write virtual memory without 
access checks 


MRQAVRITE.V.UNLOCK 


virtual 


write 


current 


write-unlock virtual memory 


MRQ/WRITE.P 


physical 






write physical memory 


MRQ/WRITE.PR 


physical 






write processor register 



1 Current means CUR_MOD from the PSL, mode means contents of MMGT.MODE. 
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Table 8-13 (Cont.): Ebox Mbox Requests 



Bequest 
Mnemonic 


Addressing 


Access 
Check 


Mode 
Used 1 


Operation 
Description 


MRQ/PROBE.V.WCHK 


virtual 


write 


mode 


Probe byte address for write 
- return 3-bit probe status to 
register file 


STORE 2 


virtual 3 


write 3 


current 3 


write to physical address in PA 
queue 


MRQ/LOAD.PC 


— 


— 


— 


send the data to the Ibox to be 
used as the new PC 


MRQ/SYNC.MBOX 


— 


— 


— 


synchronize with 
memory management check from 
previous Mbox request by issuing 
a form of nop. 


MRQ/TB.TAG.FILL 


virtual 


— 


— 


directly load TAG part of 
"current" TB entry 


MRQ/TB. PTE. FILL 








directly load PTE part of 
"current" TB entry 


MRQ/TB.INVAL.SINGLE 


virtual 






invalidate single TB entry, if 
present 


MRQ/TB.INVAL.PROCESS 








invalidate all TB entries for 
current process 


MRQ/TB.INVAL.ALL 








invalidate all TB entries 



1 Current means CUR.MOD from the PSL, mode means contents of MMGT.MODE. 

2 This operation is not initiated through the MRQ field. It is issued by microwords specifying DST/DST and Fbox operations 
with f%store_h asserted, given that the destination queue entry indicates a memory destination. 

3 Translation and access check done previously by the Mbox. 



The store operation in the above table is not specified in the MRQ field. Each destination queue 
indirect result store which is to memory (as opposed to a GPR) is turned into a Mbox store request. 
The Mbox writes the data received with this request to the address extracted from the PA queue. 
(Two address entries in the PA queue are needed for unaligned stores.) 

The load-PC operation is accomplished with the aid of the Mbox (MRQ/LOAD.PC). The Mbox's part 
is to pass the data (PC) on E%WBUS_H<3 1 Kb- to the Ibox via M%MD_BUS_H<3l.o>. The Ebox signals 
the Ibox that the new PC value is coming. 

The information sent to the Mbox when the Ebox issues a command is shown in the following 
table. The information, except E%WBUS_H<3i.o> data, is valid in S4. The command information 
is driven early in S4, while the address isn't valid until late in S4. E%WBUS_H<3lrf)> data is valid 
early in S5. The table shows the source of each item. See Chapter 12 for the encoding of these 
fields. 
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Table 8-14: Ebox Memory Request Information Busses 



Signal 


Source 


Description. 


E%EBOX_CMD_H<4iO> 


decoded from MBQ and DST microword 
fields 


Request - command code 


E%WBUS_H<31«0> 


E_BUB<fcWBUS_L<31iO> 


write data, not ready until S5. (only needed for write 
type and store operations) 


E%VA_BXJS_L«31iO> 


VA register with bypass 


address (or PTE in case of TB.PTE.FILL) 


E%KBOXJTAG_H<4*> 


DST microword field 


address in register file where read or probe result is to 
go 

access type for operation 


E%EBOX_AT_H<14>> 


decoded from mbq microword field 


E%EBOXJDL_H<liO> 


DL register 


data length for access 


e%ebox^vtrt_addr_h decoded from mbq microword field 


Indicates virtual address - translation needed 


K%NO_MME_CHECK_H 


decoded from mrq microword field 


Indicates no access check should be done 



This information is all latched by the Mbox in the EM_LATCH. This latch can only hold one 
command. Once it is full the Mbox will ignore Ebox requests until it is empty again. It is 
emptied when the Mbox request completes. 

To process requests from the Ebox and from the Ibox, the Mbox receives the CUR_MOD bits from 
the PSL and the MMGT.MODE register contents. The CUR.MOD bits are normally used as the 
access mode for a request's TB check. The MMGT.MODE bits are used only when the request is a 
PROBE. VRCHK, PROBE.V.RCHKNOPILL or PROBE.V.WCHK. Note that the Mbox uses the CUR.MOD field 
for all Ibox-initiated requests at all times, so it must receive both mode fields simultaneously. 

The address for Ebox-initiated memory accesses comes from the VA register. The microword 
issuing the memory request may update the VA register. If it does, the new VA value is sent with 
the request. The write data for a memory request is the data put on E%WBUS_H<31:0>, a buffered 
copy of E_BUS%WBUS_L<31K)>, by the microword issuing the memory request. 

The following table shows what the Ebox sends on each of the memory request information busses 
for each operation. 



Table 8-15: 


Ebox Memory Request Information Truth Table 








Request 
Mnemonic 


E%EBOX_ 
CMDJB<40> 


K%KBOX_ 
AT_H<1«0> 


E%EBOX_ 
DL_H.cH>> 1 


E%EBOX_ 
TAG_H<4A> 2 


E%KBOX_ 

VTRT_ 

ADDRH 


E%NOJHME_ 
CHKCK_H 


Addr/Data 
Sent? 


READ.V.RCHK 


DREAD 


read 


DL 


DST 


true 


false 


yes/no 


READ.V.WCHK 


DREAD 


modify 


DL 


DST 


true 


false 


yes/no 


READ.V.NOCHK DREAD 


read 


DL 


DST 


true 


true 


yes/no 


READ.V.LOCK 


DREAD. 
LOCK 


modify 


DL 


DST 


true 


false 


yes/no 



*DL means data length dictated by the microword; the DL register value unless the microword overrides the data length 
to longword. 

2 DST means the tag is the register specified in the DST field of the microword. 
— means don't care, doesn't apply. 
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Table 8-15 (Cont): Ebox Memory Request Information Truth Table 



Request 


E%KBOX_ 


K«£BOX_ 
AT fi<l*0> 


E%EBCE£_ 


EttEBOX. 


E%EBOX_ 

vrax_ 

limp n* 


E%NO_MME_ 
CHBCK_fi 


Addr/Data 


txjLAL/Jr 


XJKJUAIJ 


read 


rvr 
JJii 


Uol 


raise 




yes/no 


np at\ DU 


lJrK_rvU 




ULi 


HOT 


raise 




yes/no 


PROBE. V.RCHK 


PROBE 


read 


Byte 


DST 


true 


false 


yes/no 


PROBE. VJRCHK. 

JNUJJlLiLi 


PROBE 




Byte 


DST 


true 


true 


yes/no 


WHTJV 
WUril\. 


JVUMUlr_V^XlJV 


write 






true 


raise 


yes/no 


WK1 1 hi. v. wuuiv 


\X7"D 1 'PL* 


write 


TIT 




true 


raise 


yes/yes 


w kl i hi . v. w Jtwv 


WKIIJEi 


write 


JJJLf 




true 


true 


yes/yes 




Wxtl 1Hi_ 

UNLOCK 


write 


t-\t 
Dl» 




true 


raise 


yes/yes 


WRITE.P 


WRITE 


write 


DL 


— 


false 


— 


yes/yes 


VXTO TAHITI 1 ill 


1 nil iir i_i 








false 




yes/yes 


PROBE.V.WCHK 


PROBE 


write 


Byte 


DST 


true 


false 


yes/no 


STORE 


STORE 


— 


— 


— 


false 


— 


no/yes 


LOAD. PC 


LOAD_PC 


— 


— 


— 


false 


— 


no/yes 


SYNC.MBOX 


NOP 


— 


Byte 


— 


false 




no/no 


1X3.JT IJi.f XX^l^ 


ij> nil 
PILL 




uyce 




false 


true 


yes'/no 


TB.TAG.iILL 


TB_TAG_ 
FILL 




Byte 




false 


true 


yes/no 


TB.INVAL.SINGLE 


TBIS 




Byte 




false 


true 


yes/no 


TB.INVAL.PROCESS 


TBIP 




Byte 




false 


true 


no/no 


TB.INVALALL 


TBIA 




Byte 




false 


true 


no/no 



1 OL means data length, dictated by the microword; the DL register value unless the microword overrides the data length 
to longword. 

2 DST means the tag is the register specified in the DST field of the microword. 

3 PTE data is sent on address bus through VA register. 

4 Special code — no access check is done. Only the presence of an entry in the TB is checked. 
— means don't care, doesnt apply. 



8.5.17.1 IO Read Synchronization 

Because the Ibox issues operand reads before the Ebox executes the associated macroinstruction, 
there is a possibility that an exception or branch will result in an operand read occurring even 
though the associated macroinstruction is never executed. This is not a problem if the read is to 
memory space, but it might be if the read is to IO space. Many IO space reads have side effects, 
so some mechanism is required which postpones an Ibox issued IO space read until the Ebox is 
actually executing the macroinstruction which requires the IO space read. The Mbox delays all 
IO space reads issued by the Ibox until the Ebox asserts the signal E%START_IBOX^IO_RD_H . 



DIGITAL CONFIDENTIAL 



The Ebox 8-63 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



The Ebox asserts E%START_ffiO^_IO_RD_H when the following are all true: 

1. The Ebox is stalled in S3 waiting for a register file entry indexed through the source queue 
(i.e., A/Sl, A/S2, B/Sl, or B/S2) to become valid, or E_FLQ%FQ_STALL_H is asserted, 

2. there is exactly one entry in the retire queue, 

3. there is no stall of S4 of the RMUX part of the Ebox pipeline, 

4. conditions 1, 2, and 3 were true in the previous cycle, 

5. there is no MD fault for any of the MD registers currently being accessed in (stalled) S3, 

6. and the Ebox pipeline is not being flushed by a microtrap this cycle. 

The Mbox processes specifier queue entries one at a time (the specifier queue is the queue in 
the Mbox which receives all operand data references issued by the Ibox). If the specifier queue 
entry is an 10 space access, the Mbox will not process it unless S6 in the Mbox is idle (not 
processing any reference) and S6 was idle in the previous cycle and E%START_EBOX_IO_RD_H is 
asserted. (Note that a one cycle delay occurs in the Mbox on E%START_EBOX_IO_RD_H . This is why 
the current cycle and the previous cycle are checked for NOP in S6 in the Mbox.) If the Ebox 
is stalled waiting for read data to be put in an MD by the Mbox, and the Mfoox is waiting for 
E%START_EBOX_IO_RD_H to be asserted (because the specifier queue entry is an IO space read) 
then the Ebox must be waiting for the result of that 10 space read. 

The Ebox only asserts E%START_ffiOX_IO_RD_H when it is certain that the macroinstruction which 
will use the result of the 10 space read is going to execute. If the retire queue contains more than 
one entry, other instructions are in the Ebox or Fbox pipeline so E%START_IBOX_IO_RD_H is not 
asserted in case one of them incurs an exception. If the Ebox is stalled in (RMUX) S4, it doesn't 
assert E%START_IBOX w IO_RD_H because the previous macroinstruction's result store may incur an 
exception when it advances to S5. (Note that the retire queue entry is removed from the queue 
before the RMUX S4 stall status is known so that the RMUX S4 stall status has to be examined as 
well.) 

If the Ebox is being flushed by a microtrap in the current cycle, it doesn't assert 
E%START_IBOX_IO_RD_H because the previous macroinstruction actually had a trap. 

If there is an MD fault being reported in S3 of the Ebox, then the Ebox will take a microtrap 
after one cycle with no S4 stalls has passed. In the interim, E%STARTJBOX_IO_RD_H must not be 
asserted. 

Assertion of E%START_IBOX_IO_RD_H when field queue stall is present is necessary to avoid 
deadlock, however it will cause the CPU to start an 10 space operand prefetch even when a 
memory management fault will cause the instruction to be fault. For example, this might occur 
with ADAWI if the second operand is in 10 space and the first can incur a memory management 
fault. 

8.5.17.2 Mbox-Ebox signals 

The Mbox drives the following control signals for Ebox use: M%EM_LAT_FULL_H 
and M%PA W Q_STATUS_H<2K». M%EM_LAT_FULL_H tells the Ebox the EM.LATCH is full. 
M%PA_Q_.STATUS_H<2:0> gives the status of the current PA queue entry. M%PA,_Q_STATUS_H<0> 
indicates that sufficient entries are valid in the PA queue to accept a store request. Multiple 
PA queue entries are needed for a store when the store will access multiple longwords in 
memory (as in quadword length stores and unaligned stores which cross a longword boundary). 
M%PA_Q_STATUS_H<1> indicates that the relevant PA queue entries have a memory management 
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fault associated with them. The Ebox will not issue the store; it will microtrap when the microcode 
attempts it. M%PA W Q_STATUS_H<2> indicates the relevant PA queue entries have a hardware error 
associated with them. The Ebox will not issue the store; it will microtrap when the microcode 
attempts it. 

In one case Ebox logic ignores M%PA_Q_STATUS_H<o> and behaves as if it is deasserted. Due to 
complexities in the Mbox, M%PAjfc_STATUS_H<2:l> are not logically correct in the cycle in which 
the Ebox aborts a EMJLATCH operation by asserting E%EM_ABORT_L. This happens when the Ebox 
aborts an Fbox result store operation because of F%STORE_STALL_H (see Section 8.5.16.3). 

Due to complexities in the Mbox, M%PA_Q_STATUS_H<2:1>, which signal memory management 
exceptions and hardware errors associated with the PA queue entry, are not always correct in a 
cycle in which an EMJLATCH operation is aborted by assertion of E%EM_ABORT_L. In this cycle, 
the Ebox ignores M%PA_Q_STATUS_H<o> and behaves as if it is deasserted. M%PA_Q_STATU S_H<o> 
qualifies every use of M%PA_Q_STATUS_H<2:1>, so the Ebox can't incorrectly take or not take an 
exception because of incorrect M%PA_Q_STATUS_H<2:1> values. 

The Ebox ignores M%PA_Q_STATUS_H<0> only in cycles in which a store of Fbox data was 
sent in the previous cycle and was aborted in this cycle by asserting E%EM ^ABORT_L because 
F%STORE_STALL_H was asserted. This be coincident with an actual pipeline abort (which also 
causes assertion of E%EM^ABORT_L if a request was sent to the Mbox in the previous cycle). In 
this case the Ebox will ignore M%PA_Q_STATUS_H<0> in a cycle in which the microword in S4 is 
effectively a NOP, and no change in behavior will result. 

The Ebox stalls the microword in S4 if it specifies an Mbox request and the EM.LATCH is full. 
Also, S4 is stalled if the microword specifies a store and M%PA_Q_STATUS_H<0> is not asserted. 

The Mbox drives several signals and busses used in writing the data into the register file. 
These are M%EBOXJ)AIA_H, M%MD_BUS_H<3 1.-0>, and M%MD_TAG_H<4.-0>. When M%EBOX_DATA_H 
is asserted, the data on M%MD_BUS_H<31:0> is written into the register addressed by 
M%MD_TAG_H<4 K)> . Note that M%MD_TAG_H<4:0> is 5 bits; it can address up to 32 locations. The 
organization of the register file is such that the MD, Wn, and GPR registers (a total of 27 registers) 
are in the first 32 locations in the register file. This means they can be addressed with a 5-bit 
tag (which is mapped into the full 6-bit address by zero extension). 

The Mbox drives fault and error flags which are associated with the data on M%MDJBUS_H<31:0> : 
M%MME_FAULT_H and M%HARD_ERR_H. If M%MME_PAULT_H or M%HARD_ERR_H is asserted when 
M%EBOXJDAlA_H is asserted, then a fault or error is being reported to the Ebox for some previously 
initiated read operation. This is handled in one of several ways, depending on the case, as is 
shown in Table 8-16. 



Table 8-16: Ebox Response to M%MME_FAULT_H and M%HARD_ERR_H 



M%MD_TA<5_H«a4«0> 

Addresses: 



Signal Asserted 



Response 



Wn or GPR 



M%MME_FAULT_H 



The Ebox ignores this case. m%mmejtbap_l would have been 
asserted for the same fault in a previous cycle. 



WnorGPR 



MD 



M%MMK_FAULT_H 



M%HAHD_ERR_H 



In this case the Ebox forces an immediate hardware error 
microtrap. 

In this case the fault bit for the particular MD is set in the 
register file 
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Table 8-16 (Cont.): Ebox Response to M%MME_FAULT_H and M%HARD_ERR_H 



ME%MD.XAG_H«M» 






Addresses: 


Signal Asserted 


Response 


MD 


MKHABDJERBJB. 


In this case the error bit for the particular MD is set in the 






register file. 



The Mbox drives M%MME_TRAP_L and M%TB_PERR_TRAP_L to force immediate microtraps. 
M%MME_TRAP_L causes a memory management exception microtrap, while M%TB_PERR_TRAP_L 
causes an asynchronous hardware error microtrap. 

The Ebox asserts certain Mbox control signals under the control of the MISC and MISC2 
fields of the microword. These signals are E%FLUSH_MBOX_H, E%FLUSH_PA_QUEUE_H, and 
E%RESTART_SPEC_QUEUE_H . E%FLUSHJVfBOX_H is asserted when MISC/RESET.CPU is specified. It 
causes the Mbox to flush ongoing Ebox reads, including those initiated by the Ibox. It also flushes 
the specifier queue. It does not flush the PA queue, so writes and stores already issued by the 
Ebox are not affected. 

E%RESTART_SPEC_QUEUE_H is asserted when MTSC/RESTART.MBOX is specified. It restarts Mbox 
processing of specifier queue references. Mbox specifier queue processing is stopped by Ibox 
request when certain complex macroinstructions are encountered. 

E%FLUSH_PA_QUEUE_H is asserted when MISC2/FLUSH.PAQ is specified. It causes the PA queue in 
the Mbox to be flushed. MISC2/PLUSH.PAQ should always be sepcified with a MRQ field request 
which causes an EM latch command (i.e., other than MRQ/SYNC.BDISP, MRQrtSYNC.BDISP.RETIRE, 
MRQ/SYNC .BDISP.TEST.PRED , or MRQ/NOP). 

When a pipeline abort occurs, the Ebox asserts E%EM_ABORT_L, conditionally. It is asserted 
because the abort is recognized too late to prevent the Ebox from issuing an Mbox request in S4. 
E%EM_ABORT_L is asserted in S5 and signals the Mbox to disregard the command just sent in S4. 
It is only asserted if the Ebox actually made an Mbox request in S4 and the EMJLATCH wasn't 
full. Even stores and write requests are aborted in this case. 

8.5.17.3 Ibox IPR Access and LOAD PC 

The Ebox detects Ibox IPR access requests in S5. At that time it asserts a command strobe to 
the Ibox. The Mbox will also detect that the IPR access is to the Ibox. It will treat an Ibox IPR 
read as a NOP. For IPR writes the Mbox forwards the data on M%MDJBUS_H<31:0> in a later cycle. 
Microcode synchronizes with Ibox IPR writes by issuing a MRQ/SYNC .MBOX after the operation. 
Once the MRQ/SYNC .MBOX is complete, the microcode knows the Ibox has the data. 

In detecting Ibox IPRs,. the Ebox treats the entire range of normal IPR addresses from DO to DF 
(hex)as Ibox IPRs. The exact test used by the Ebox is: VA<9:6>=D (hex) and VA<24>=0. The low 
four bits (VA<5:2>) are sent to the Ibox so it can determine which of its IPRs is specified. 

The Ebox requests a load-PC Mbox operation in S4 when the microword specifies LOAD.PC in the 
MRQ field. In S5 of that microword it asserts a command strobe to the Ibox informing it that the 
Mbox will soon forward the new PC value. Microcode synchronizes with the load-PC operation by 
specifying a SEQ.MUX/LAST.CYCLE. The instruction queue must be empty at this time. Once the 
Ibox adds a new instruction queue entry, a macroinstruction dispatch occurs. While waiting, the 
Ebox executes a continuous stream of "STALL" microwords (see Section 8.550.1). 
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8.5.18 Ebox Vector Support 

The Ebox supports potential future vector architecture integration by providing a configuration 
status bit which is available for microcode conditional branches. 

VECTOR_UNIT_PRESENT is a configuration status bit for vector support in the IPR ECR. See 
Section 8.5.22. Microcode can branch conditionally on the VECTOR_UNIT_PRESENT status. 

8.5.19 Fault and Trap Management 

There are three kinds of VAX Architecture exceptions: faults, aborts, and traps. In all cases the 
PC, PSL, and other data is pushed on the stack, and the address of an exception handling routine 
is fetched from the SCB. For a trap, the instruction which caused the trap has finished completely, 
and the PC on the stack points to the next instruction to execute. For a fault, the PC on the stack 
points to the instruction which caused the exception. For an abort, the PC, PSL, and other state 
are UNPREDICTABLE; however, whenever possible the NVAX CPU tries to turn aborts into faults. 
The difference between an abort and a fault is that no important architectually visible state was 
modified by the instruction if it was a fault, while some important architecturally visible state may 
have been modified if it was an abort. (Certain state, for example, the memory location which 
is pointed to by the stack pointer, can be modified in the case of a fault. Generally speaking, 
aborts are cases where restarting the instruction may not work because some state which the 
instruction depended on may have been altered.) The VAX Restart Bit in the machine check stack 
can be used in determining whether it is safe to treat an abort as a fault. 

lb cleanly support the concepts described in the previous paragraph, the NVAX CPU has a 
macroinstruction commit point in the pipeline. Once any microword of the execution microflow 
has passed this point, the macroinstruction may have modified architectural state. Until the 
first microword of the microflow passes the commit point, the instruction cannot have modified 
any architectural state. This point is the boundary between S4 and S5 in the Ebox pipeline. No 
architecturally visible state is ever modified in S3 or S4 of the pipeline. For example, the PSL 
and all registers in the register file are written only in S5. Also, memory requests are not issued 
until a microword specifying one is about to advance into S5, and it is certain there are no S4 
stalls. 

Each macroinstruction execution microflow obeys the restriction that no microword in that 
flow modifies any architectural state before it is certain that all the operand specifiers for the 
instruction have been properly fetched and decoded and that all the memory accesses which this 
microflow will request are not going to encounter a memory management violation. This does not 
mean that no microword of the microflow passes the S4/S5 boundary before all this is checked. It 
only means that the microwords in the microflow don't write memory or any other architecturally 
visible state until these things are verified. The net result is that macroinstructions which 
encounter a memory management violation are restartable once the condition has been corrected. 
(Note that the string instructions don't quite follow these simple rules. Instead, they use a more 
elaborate set of rules to ensure that they can be restarted after any memory management fault.) 

Microflows for macroinstructions which might encounter any kind of fault other than a 
memory management exception specifically test for the fault conditions) before modifying any 
architectural state. This is in addition to checking for memory management faults, as described 
above. 
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Ebox hardware forces a reserved opcode fault for Fbox instructions (except MULL) when the Fbox 
is disabled, in S4 of the first microword of Fbox macroinstruction execution flows. Because this 
fault is requested in S4 of the first microword, it prevents any architectural state from being 
altered by these flows. 

Hardware errors are handled differently. They generally can't be checked for, and the architecture 
doesn't require any such checks. Generally aborts occur as a result of a hardware error 
encountered in an macroinstruction after all memory management checks have been done. In 
these cases, some architecturally visible state may have been modified before the macroinstruction 
has completed. 

8.5.19.1 Faults and Errors Detected In S4 

When the Ebox detects a fault or error condition in S4 associated with an operation that is about 
to advance into S5, it signals the Microsequencer to cause a microtrap. The microtrap will cause 
the Ebox pipeline to abort before it advances. Any operation which was in S5 already completes 
normally, but the operation in S4 is purged before it enters S5. The operation in S5 may be part 
of a previous macroinstruction microflow. That macroinstruction is not affected by the microtrap. 
The microword in S4 may be the first microword to modify architecturally visible state in a given 
execution microflow so it must be prevented from advancing into S5. 

8.5.19.1.1 Coordinating Ebox and Fbox Faults and Errors 

It is necessary that macroinstructions retire in order, even when there is a fault or error detected 
in S4. The microtrap for the fault or error must be delayed until the macroinstruction connected 
to the fault or error is next to retire. The current retire queue entry is used by the Ebox to 
decide whether a microtrap should be signaled. For example, if a branch displacement access 
fault or error is detected by the Ebox in S4 but the retire queue indicates the Fbox is next to 
retire a macroinstruction, then the branch macroinstruction came after the one being executed 
in the Fbox. The branch's fault or error must not cause a microtrap until the Fbox has retired 
its macroinstruction. Then the microtrap is forced, given that the next entry in the retire queue 
indicates the Ebox is next to retire a macroinstruction. The microtrap occurs in S4 after the 
Fbox's last operation advances into S5. The branch is prevented from retiring by the microtrap, 
since it incurred a fault or error. 

The Fbox reports a number of faults and one error to the Ebox. The Ebox ignores them until 
the retire queue indicates the Fbox is next to retire a macroinstruction. The reason is the same 
as in the previous paragraph. The microtrap has to be delayed until the logically preceding 
macroinstructions are advanced into S5. 

Destination queue and PA queue faults and errors can be connected either to the Ebox or the 
Fbox. It depends on whether the box selected by the retire queue is requesting a destination 
queue indirect store. If the destination queue is empty and I%1MEM_MEXC_H, I%IMEM_HERR_H, 
or I%RSVD_ADDR_FAULT_H is asserted and the box indicated by the retire queue is requesting a 
destination queue store, then a microtrap is signaled immediately. Also, if a destination queue 
store is requested while the current destination queue entry is valid and M%PA_Q_STATUS_H<1> 
or M%PA_Q_STATUS_H<2> is asserted, a microtrap is taken (see Section 8.5.17). 
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8.5.1 9.1 .2 Breaking the S4 Stall 

Other than the requirement that instructions retire in order, S4 stalls do not delay microtraps 
for faults or errors which are in S4. In other words, any S4 stall is broken once a fault or error 
in S4 is due to the next macroinstruction to retire. 

8.5.19.2 Faults and Errors detected In S3 

When the Ebox detects a fault or error condition in S3, it latches it in order to carry it down 
the pipeline to S4. Unlike most control signals propagating down the pipeline, these fault and 
error conditions are not forced off when S3 is stalled and S4 isn't stalled. So the S3 stall doesn't 
have to end for the fault/error condition to propagate to S4. However, the fault/error conditions 
do stall in S3 if there is an S4 stall. This is because the microword in S4 may be from a previous 
macroinstruction. That instruction must be allowed to complete normally before the microtrap. 
Once the fault or error status has advanced into S4 and the retire queue indicates the Ebox is 
next to retire a macroinstruction, the Ebox signals the Microsequencer to microtrap. 

8.5.19.3 Integer Overflow and Branch Mispredict Traps 

There are two traps handled in Ebox hardware. They are integer overflow traps, and branch 
misprediction traps. Integer overflow traps are VAX Architecture exceptions, while branch 
misprediction traps are not part of the VAX architecture. Both of these traps are handled in the 
Ebox by causing a microtrap once the last microword of the macroinstruction's execution microflow 
has entered S5. The microtrap prevents the next microword (which is the first microword of a new 
microflow) from advancing into S5. This means that the macroinstruction in question completes 
properly but its successors are not allowed to execute. This is done for integer overflow because 
this is the effect required by the VAX Architecture. It is done for branch misprediction because 
this is the effect required to recover from an incorrectly predicted conditional branch. 

Integer overflow traps occur when a microword which specifies SEQ.MUX/ 
LAST.CYCLE.OVERFLOW is in S5 and PSL<IV> and PSL<V> are both set. If a microtrap is signaled, 
it prevents the next microword (or Fbox operation) from advancing into S5; the current operation 
in S5 completes regardless of whether the microtrap is signaled. 

Of the VAX architecture instructions which can cause integer overflow, MULL and all the CVT 
instructions are executed in the Fbox (except that MULL is executed in the Ebox when the Fbox 
is disabled). Integer overflow is detected in the Fbox for these instructions. The Ebox determines 
that an integer overflow occurred by examining the new PSL<V> bit for every Fbox retiring 
instruction, lb distinguish instructions which can incur integer overflow traps from others the 
Fbox might retire, the Ebox checks the map specifier supplied by the Fbox. MULL and CVTs with 
integer destinations all use the same map specifier, and no other Fbox executed instruction uses 
that particular specifier. "When the instruction being retired by the Fbox uses that particular 
map specifier, and PSL<IV> and PSL<V> are both set, the Ebox forces the microtrap for integer 
overflow. 

Branch misprediction traps are taken in S5 when the microword specifies SYNC.BDISP.TEST.PRED 
and the branch condition evaluator determines that the branch was incorrectly predicted. The 
Ibox prediction is read from the branch queue in S4. The branch condition evaluator result is 
available in S5. If the prediction doesn't match the actual result, a branch misprediction microtrap 
is signaled. The microtrap will prevent the microword in S4 from completing. That microword 
may have been the first microword of the execution microflow for the next macroinstruction. It is 
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not supposed to be executed because the Ibox incorrectly predicted the outcome of the conditional 
branch. For more on mispredicted branches see Section 8.5.15.5. 

If a branch mispredict is detected at the same time as an integer overflow, the integer overflow 
microtrap is taken. See Section 8.5.19.5. 

8.5.19.4 Ebox Microtrap Handling 

The Ebox makes a microtrap request by asserting one of a number of microtrap request signals. 
The Microsequencer causes a microtrap at the end of the current cycle. The Microsequencer has 
a priority encoder which it uses to decide which microtrap dispatch should be taken when more 
than one microtrap request is asserted (see Chapter 9). Regardless of which microtrap is taken, 
the signal E_USQ%PE_ABORTJL is asserted, causing an effective no-op to be inserted into all the 
control latches in S3, S4, and S5. The result is a pipeline abort. 

Early in a pipeline abort cycle (the cycle in which all the control latches in the pipeline are 
flushed), the Microsequencer signals asserts E_US© y %PE_ABORT_L. The Ebox responds by flushing 
the retire queue and the Ebox pipeline. Also, if in the last cycle a new command had been accepted 
by the Mbox, the Ebox asserts E%EM_ABORT_L which aborts that command. (E^EM^ABORT.L will 
abort any EMLATCH entry.) 

In the case of a branch mispredict microtrap, the Ibox has already been signaled by the Ebox 
that a mispredict occurred. The Ibox has the alternate PC latched, and it will begin fetching from 
that location as soon as it has unwound the RLOG. See Chapter 7 for more detail. 

All microtrap flows except branch mispredict execute a RESET.CPU. This causes a flush or reset of 
the Ebox queues and register file valid bits, the Fbox, and the Mbox (except the PA queue and 
EM_LATCH). It also causes E%STOP_EBOX^H to be asserted. These microtrap flows then read the 
Ibox IPR which causes the RLOG to be unwound and returns the backup PC. 

The branch mispredict microflow doesn't execute a RESET.CPU because the Ibox automatically 
recovers from the branch mispredict and begins fetching instructions from the correct memory 
location. For the same reason, it does not read the Ibox IPR which causes the RLOG to be unwound 
and returns the backup PC. For branch mispredict, Ebox hardware asserts all the flush or reset 
signals that MISC/RESET.CPU would have caused except that E%STOPJTJBOX l _H is not asserted. 

All microtrap flows synchronize with the Mbox by executing MRQ/SYNC.MBOX. Then they execute 
a MISC2/PLUSH.PAQ which causes the PA queue in the Mbox to be flushed. This allows any stores 
which were pending in the EMJLATCH to be finished before the PA queue is flushed. 

Certain microcode rules and restrictions apply to the process of gathering state and flushing the 
various boxes and function units within boxes. See Section 8.5.27.18. 

8.5.19.5 Coincidence of Branch Mispredict Trap with other Traps 

It is possible for a branch mispredict trap to happen at the same time as an integer overflow trap. 
When this occurs, the integer overflow trap is taken because it has higher priority than branch 
mispredict. However, the Ibox is still signaled that a branch mispredict took place. In the few 
cycles that it takes for the MISC/RESET.CPU in the integer overflow microflow to arrive at S5 in 
the Ebox pipeline, the Ibox has begun unwinding the RLOG and correcting the backup PC queue. 
Once the Ibox starts this process, it delays its own response to the E%STOP_IB05LH signal (which 
is asserted by MISC/RESET.CPU) until it has completed the correction process for the mispredicted 
branch. In this way, the correct backup PC is made available to the integer overflow microflow. 
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It is also possible for a trace fault to follow a mispredicted branch. In this case, the branch 
mispredict trap flushes the pipeline (purging the microflow for a trace fault which is following it 
down the pipeline) and the Ibox unwinds the SLOG and corrects the backup PC queue. Then the 
branch mispredict microflow executes a LAST. CYCLE which causes the Microsequencer to dispatch 
to the trace fault handler. Early in the trace fault microflow, RESET.CPU will be executed, and so 
E%STOP_IBOXja may be asserted to the Ibox before it has finished correcting for the mispredicted 
branch. The Ibox^s ability to delay its response to E%STOP_IBOX_H is what allows the Ibox to 
finish its corrective action 

8.5.19.6 Possible Mlcrotrap Requests 

The following table lists the microtrap requests the Ebox can make. 



Table 8-17: Ebox Microtrap Requests 



Mlcrotrap 



When 
Asserted 



Sources 



Signal 



Memory S4 
management fault 

Memory access error S4 

Reserved S4 
addressing mode 

Reserved operand fault S4 

Reserved S4 
instruction fault 

Branch S5 
misprediction trap 

Integer overflow trap S5 



Floating overflow fault S4 

Floating underflow fault S4 

Floating divide-by-zero S4 
fault 



Ibox signal, MD fault status bits, PA queue 
fault bit, or indicated by Fboz signal, 

F*MMGT_FAULT W H 

Ibox signal, MD fault status bits, or 
indicated by Fboz signal, F%8£KBR^H 

Ibox signal or indicated by Fbox 



Indicated by Fbox 

For floating point inacroinstructions when 
the Fbox is not enabled. 

branch result mismatch 



pskV> and pskIV> both set, and 
8EQJ^UX/LASr CYCLE, overflow or Fbox map 
specifier indicates integer result 

Indicated by Fbox 

Indicated by Fbox 

Indicated by Fbox 



E_ J FLT%MMEJBRH_H 

E_FLT%HW_EBILH 

E_FLT%RSVD _>VJ>DKJMK>D^^H 

E w FLT%FXOATING_FAULTja 
E W FLT%RSV1>JN8TR_L 

E%BHANCHJ«ISPREI)ICTJ^ 

E_FLT%IOVFLJL 

E JTLT%FIX>ATING_FAULT_H 
E_FLT%FLOATING_FAULT_H 
EJP"LT%FLOAITNG_FAULT_H 



8.5.19.7 Fbox Fault Reporting 

The four Fbox faults, reserved operand, floating overflow, floating underflow, and floating 
divide-by-zero all cause the same dispatch in the Microsequencer. The Ebox latches a priority 
encoded status when one of these faults is reported by the Fbox. This status is available to the 
trap handler via a microbranch. The priority order, from highest to lowest, is reserved operand, 
floating divide-by-zero, floating overflow, and floating underflow. Table 8-18 shows the code for 
each of the four fault conditions. 
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Table 8-18: Fbox Fault Codes 



Fault 


Priority 


Code 


Reserved operand 


1 


0 


Floating divide-by-zero 


2 


1 


Floating overflow 


3 


2 


Floating underflow 


4 


3 



8.5.20 Ebox Stalls 

The Ebox pipeline is controlled by the Ebox stall logic. It supplies stall signals which gate clocking 
of data information into each pipeline stage. The Ebox stall logic stalls only the segments which 
must be stalled. S5 is never stalled. S3 stalls if S4 is stalled. If S3 is stalled but S4 is not stalled, 
a NULL microword (or, more generally, an effective no-op) is injected into S4 after the control 
information in S4 advances into S5. 

The clock for each pipeline latch in S3 and S4 is #i gated by a stall control signal. The stall 
control signals are E%S3_STALL, E%S4_STALL, and E%RMUX U _S4_STALL for stages S3, S4, and RMUX 
S4 respectively. These signals determine whether the corresponding latches are opened in 

The stall control signals are used to stall a pipe stage. A stage is stalled when it cannot complete 
its operation for some reason. Generally data needed by the stage is not yet valid, but is expected 
to become valid after some time. Also stage N will be stalled when stage N+l is not ready to 
receive the output of stage N. 

The Ebox pipeline can be stalled while the Fbox uses the RMUX portion of the pipeline to store 
results. When the Fbox is next to retire an instruction, E%RMUX_S4_STALL, E%RMUX_S4_FLUSH, 
and E%RMUX_S5_FLUSH depend on the progress of Fbox result store operations. When the Ebox 
is next to retire, these signals are driven to the same logic level as E%S4_STALL, E%S4_FLUSH, and 
E%S5_FLUSH, respectively. 

The clock for S5 pipeline latches is not gated. However there is an S5 flush signal for control 
information and another flush signal for the output of the RMUX. 

The S3, S4, and S5 pipeline latches which hold control information also have an 
asynchronous reset input signal: E%S3_FLUSH, E%S4_FLUSH, E%RMUX_S4_FLUSH, E%S5_FLUSH, 
and E%RMUX_S5_FLUSH. These signals clear (flush) the control information to an effective no-op. 
They are asserted after the clock which loads the latch but before the control information is used 
to alter any state in the Ebox or anywhere else in the NVAX CPU. 

The flush control signals are used to insert effective no-ops into a particular stage. This is done 
for two distinct reasons. First, when pipeline stage N is stalled but stage N+l is not stalled, 
an effective no-op is inserted into stage N+l as its current operation advances to stage N+2. 
Secondly, when a pipeline flush is needed, the flush signals are all asserted, so every stage of the 
pipeline has an effective no-op inserted. The Ebox flushes the pipeline when the Microsequencer 
asserts E_USQ%PE_ABORT_L (which indicates that a microtrap dispatch has been initiated). 

Figure 8-8 shows control and data path latches and how the various pipeline control signals are 
typically connected. 
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Ebox Pipeline Latches 



S3 



S4 



S5 



RMUX (DATA PART) 



DATA 



CONTROL 

MICROWORD AND 
ITS DECODES AND 
FBOX CONTROL SIGNALS 



£MJ_£IiLL 




RMUX (DATA PART) 



RMUX (CONTROL PART) 




EXRUUX K5 FLUSH 

EttM FLUSH 



Table 8-19 shows the various pipeline stall and flush combinations which can occur. An important 
factor in determining the pipeline controls is whether the Ebox or Fbox is next to retire a 
macroinstruction. This status is given by the current retire queue entry. 



Table 8-19: Ebox Pipeline Stall and Flush Cases 



Ebox Next to Retire a Macroinstruction 



Pipeline Control 
Case 



S3 Clock/ 
S3 Flush 



S4 Clock/ 
S4 Flush 



RMUX S4 Clock/ 
RMUX S4 Flush 



S5 Flush/ 
RMUX Flush 



No Stalls 

S3 Stall (with no S4 
stall) 



run/don't flush 
stall/don't flush 



run/don't flush 
run/flush 



run/don't flush 
run/flush 



don't flush/don't flush 
don't flush/don't flush 



S4 Stall 



stall/don't flush 



stall/don't flush 



stall/don't flush 



flush/flush 
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Table 8-19 (Cont.): Ebox Pipeline Stall and Flush Cases 







Ebox Next to Retire 


a Macroinstruction 




Pipeline Control 


S3 Clock/ 


S4 Clock/ 


RMUX S4 Clock/ 


S5 Flush/ 


Case 


CO dllck 

•so r iusn 


&* JPlUsn 


RMux 84 riusn 


RMUX Flush 


Pipeline Abort 


run/flush 


run/flush 


run/flush 


flush/flush 






Fbox Next to Retire 


a Macroinstruction 1 




Pipeline Control 


S3 Clock/ 


S4 Clock/ 


RMUX S4 Clock/ 


S5 Flush/ 


Case 


S3 Flush 


S4 Flush 


RMUX S4 Flush 


RMUX Flush 



Ebox requesting 
rmux (in S4) 2 , or 
Ebox S4 stall 

Ebox not requesting 
rmux (in S4) and no 
S3 stall 2 

Ebox not requesting 
rmux (in S4) with 
S3 stall 2 



stall/don't flush 
see note 5 /don't flush 
stall/don't flush 



stall/don't flush 
see note 5 /don't flush 
see note 6 /flush 



see note 3 / 
don't flush 

see note 3 / 
don't flush 

see note s / 
don't flush 



flush/see note 4 



don't flush/see note 4 



don't flush/see note 4 



1 If Fbox is next to retire a macroinstruction, then the rmux always selects the Fbox even if the Fbox doesn't request it 

2 The Ebox is requesting the rmux if the microword in S4 specifies anything other than NONE in the DST field. 

3 Run if Fbox not requesting RMUX or if Fbox is requesting and there is no stall on the operation. Stall if Fbox is requesting 
a store and/or retire and there is a stall on the operation. 

4 Don't flush if Fbox not requesting RMUX or if Fbox is requesting and there is no stall on the operation. Flush if Fbox is 
requesting a store and/or retire and there a RMUX S4 stall on the operation. 

5 Stall if RMUX S4 clock is stalled. Otherwise run. 



As is shown in Table 8—19, when an effective no-op is inserted into S4 dining an S3 stall, S5 does 
not need to be flushed. The effective no-op in S4 will propagate into an effective no-op in S5. 

VERIFICATION NOTE 

The interaction between stalls and microbranches is different than Rigel. That all 
microbranch tests work properly when S3 is stalled and S4 is not stalled should be 
verified carefully. 



8.5.20.1 The STALL Microword 

In any cycle that the instruction queue is empty (and the Ibox is not providing a bypassed 
instruction queue entry directly to the Microsequencer), the Microsequencer fetches the STALL 
microword. This microword specifies no operation, except SEQ.MUX/LAST.CYCLE, and can't cause a 
stall anywhere in the pipeline. This allows the microwords already in the pipeline to continue 
even when the Ibox is temporarily unable to supply new instruction execution dispatches. See 
Chapter 9 for more detail. 
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8.5.20.2 Field Queue Stall 

When microcode uses the field queue, it executes a 4-way conditional microcode branch on two 
conditions, a not-empty condition and the 1-bit status in the current field queue entry. Only three 
of the 4 branch outcomes are actually possible, because the output of the field queue is forced 
off if the queue is empty. The Ibox makes an entry into to the field queue when it processes a 
field operand. While the queue is empty, the microcode loops continuously repeating the same 
conditional branch. This is very much like a stall condition in that the pipeline stages all have the 
same operation in them in every cycle while the field queue remains empty. See Section 8.5.15.8 
for more on field queue operation. 

8.5.20.3 Ebox Stall Conditions 

The Ebox stall logic detects the need for stalls in various parts of the pipeline. The stalls must 
be detected on time to gate $i latches at the start of the next cycle. This section assumes the 
Ebox is next to retire a macroinstruction. The next section deals with stalls with the Fbox next 
to retire a macroinstruction. 

The Ebox pipeline stalls in S3 when it is accessing some data in the register file which is not 
valid or when it requires an entry in the source queue which is not available. Up to two source 
queue entries and up to two MD or Wn registers can be accessed at once. The S3 stall lasts until 
all the accessed elements are valid and available. 

Wn and MD registers have valid bits associated with them. A register is valid only if this bit is 
set. A register's valid bit is not set when a memory read has been initiated for that register and 
hasn't yet completed. The valid bit is set by the Mbox when the read completes. 

The source queue read and write pointers are examined to determine when there are sufficient 
source queue entries to satisfy the microword in S3. Either one or two entries might be needed. 
Only one is needed if the source queue is referenced in the A or B microword fields but not both. 
Two are needed if the source queue is referenced in both microword fields. The Ebox stalls in S3 
if exactly the number of entries needed aren't present. In particular, if only one entry is needed, 
then the Ebox only stalls if the source queue is completely empty, and if two entries are needed, 
the Ebox stalls until two entries are made. 

The Ebox stalls in S3 if the microword in S3 is sending operands to the Fbox and the Fbox is 
indicating that it can't accept the any more operands. 

The Ebox stalls in S3 if the microword in S3 is accessing at least one GPR which is marked in the 
Fbox destination scoreboard as having an Fbox result store pending. 

Given that the retire queue indicates the Ebox is next to retire a macroinstruction, the Ebox 
stalls in S4 if the following are true: 

• The microword in S4 specifies DST/DST. 

• The destination queue is empty, or the destination queue isn't empty, the destination queue 
entry indicates a memory store, and the current PA queue entry is not valid. 

The destination queue read and write pointers are examined to determine when the destination 
queue is empty. The current PA queue entry is valid when the Mbox has completed memory 
management checks for the store reference. The Mbox asserts M%PA_Q_STATUS_H<o> when the 
PA queue entry is valid. 



DIGITAL CONFIDENTIAL 



The Ebox 8-75 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



The Ebox stalls in S4 if the microword in S4 initiates a memory operation and the Mbox is already 
working on an Ebox-initiated memory operation. The EM_LATCH in the Mbox holds the current 
Ebox memory request. It is not available until the Mbox has finished that request. The Mbox 
provides a status which informs the Ebox that the EM_LATCH is empty. 

Destination queue indirect stores that are memory stores go in the EMJLATCH like any other Ebox 
memory access. So EMJLATCH-full S4 stalls can occur even when the microword in S4 specifies 
MRQ/NOP. 

The Ebox stalls in S4 if the microword in S4 synchronizes with the branch queue and the branch 
queue is empty. The branch queue read and write pointers are examined to determine when the 
branch queue is empty. 

The Ebox stalls in S4 if the microword in S4 specifies MISC2/PDEST. CHECK and the entry in the 
destination queue needed to complete this operation is not yet valid. This stall ends when the 
Ibox writes the needed entry. 

The destination queue has a second access pointer, the FDest pointer. This pointer is 
compared to the destination queue write pointer to determine when the entry needed for the 
MISC2/FDEST.CHECK is available. 

When it is next to retire an instruction, the Fbox can cause an S4 stall by asserting 
F%STORE_STALL_H, indicating that the Fbox is stalling for this cycle because the data on 
F%FBOX_RESULT_H is incorrect or there is a data exception to be evaluated in the Fbox's last 
stage. F%STORE_STALL_H is only supposed to be asserted if the Fbox is storing a result (i.e., 
F%STORE_H is asserted). 

8.5.20.4 Fbox and RMUX Related Stall Conditions 

The Ebox has several Fbox related stalls. When the Fbox requests the RMUX the Ebox may have to 
stall the Fbox. Also, depending on which box (Fbox or Ebox) is next to retire a macroinstruction, 
several different Ebox stalls may occur. 

NOTE 

When the microcode needs to stall in S3 waiting for an Fbox operation to complete, 
one or two microwords which specify DST/WBUS should precede the microword needing 
the Fbox operation to be complete. Any microword specifying DST/WBUS will stall in 
S4 until the Fbox retires its instruction. The appropriate amount of delay depends on 
which result is being awaited. 

The Ebox stalls in S4 if the current retire queue entry specifies that the Fbox is next to retire 
a macroinstruction and the Ebox is requesting the RMUX. The Ebox is requesting the RMUX if 
the microword in S4 specifies anything other than NONE in the DST field. Otherwise it is not 
requesting the RMUX. 

The Ebox stalls the Fbox (by asserting a stall signal before the end of the cycle) when the Fbox is 
requesting the RMUX and one of the four following is true (note that if the Fbox is next to retire, 
the RMUX portion of the Ebox pipeline is stalled whenever the Ebox stalls the Fbox): 

• The Ebox is next to retire a macroinstruction. 

• The Fbox is next to retire a macroinstruction, is requesting to use the destination queue, and 
the current destination queue entry is not valid. 
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• The Fbox is next to retire a macroinstruction, is requesting to use the destination queue, 
the current destination queue entry is valid and indicates a memory destination, and the PA 
queue is not valid. 

• The Fbox is next to retire a macroinstruction, is requesting to use the destination queue, the 
current destination queue entry is valid and indicates a memory destination, the PA queue 
is valid, and the EM.LATCH is full. 

The Ebox determines all these conditions as described in the previous section. No part of the 
Ebox pipeline is stalled by an Fbox request if the Ebox is next to retire a macroinstruction. 

The Fbox can cause an RMUX S4 stall by asserting F%STORE_STALL_H, indicating that the 
Fbox is stalling for this cycle because the data on F%FBOXJOESULT_H is incorrect or there 
is a data exception to be evaluated in the Fbox's last stage. (This also causes an S4 stall.) 
P%STORE_STALL_H is only supposed to be asserted if the Fbox is storing a result (i.e., F%STORE_H 
is asserted). 

The Ebox is always stalled in S4 if an RMUX S4 stall occurs. 

8.5.21 Miscellaneous Operations 

The microword allows for a number of miscellaneous control and data movement operations. 
Most of them have been described elsewhere in this chapter, and are only summarized here. The 
following table lists all the miscellaneous operations by microword field and gives a description. 
Any of these fields can also specify NOP (no operation). 

Table 8-20: Ebox Miscellaneous Operations 



MISC Field - Both Standard and Special Microword Formats 



Mnemonic 



Description 



DL.LONG 



DL.BYTE 



RESET.CPU 



DL.WORD 



RESTART.IBOX 



RESTART.MBOX 



INCR.PERF.COUNT 



CLR.PERF.COUNT 



MULL 



SET.STATE.1 



CLR.STATE.3-0 



SET.STATE.0 



SET.STATE.2 



DL <- byte; change effects next microword 
DL <- word; change effects next microword 
DL <— long; change effects next microword 
restart Ibox operand specifier parsing in S5 
restart Mbox operand processing in S5 

flush Mbox and Fbox, initialize register file valid bits, flush Ebox queues, all in S6; 
stop Ibox in S5 

Clear the performance counters in S5. See Chapter 18 

Increment a performance counter in S5 if ECR<PMF_EMUX> is a certain value. See 
Chapter 18 

clear flags<3:0>; change effects next microword 
set fiag<0>; change effects next microword 
set flag<l>; change effects next microword 
set flag<2>; change effects next microword 

disables reserved instruction fault normally generated for Fbox instructions when 
the Fbox is not enabled. Used in MULL2 and MULL3 so microcode can execute the 
macroinstruction instead of the Fbox. 
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Table 8-20 (Cont.): Ebox Miscellaneous Operations 



Mnemonic 



MISC Field • Both Standard and Special Microword Formats 
Description 



CONST.10.BIT 
LOAD.SC.FROMA 
LOAD.MPU.FR0M.B 
LOADPSL.CC.IIIP 

LOAD.PSL.CC.JIZJ 



LOAD.PSL.CC.im 
LOAD.PSL.CC.mJ 

LOAD.PSL.CC.mP.QUAD 
LOAD.PSL.CC.PPJP 



Special constant generation mode. See Section 8.5.2 

SC <- K_BUS%ABUS_L<4iO> 
MPU <- E_BUS%BBUS_L«38il6> 
update PSL CCs: 

PSL<N,Z,V> <- S5 Condition Codes <N,Z,V> 
PSL<C> <- PSL<C> (Unchanged) 

update PSL CCs: 

PSL<N> <- S5 Condition Code <N> XOR. S5 Condition Code <V> 
PSL<Z> <- S5 Condition Code <Z> 
PSL<V> <- 0 

PSL<C> <- .NOT. S5 Condition Code <C> 
update PSL CCs: 

PSL<N,Z,V,C> <- S5 Condition Codes <N,Z,V,C> 
update PSL CCs: 

PSL<N,Z,V> <- S5 Condition Codes <N,Z,V> 
PSL<C> <- .NOT. S5 Condition Code <C> 

update PSL CCs: 

PSL<Z> <- PSL<Z> AND. S5 Condition Code <Z> 
PSL<N,V> <- S5 Condition Codes <N,V> 
PSL<C> <- PSL<C> (Unchanged) 

update PSL CCs: 

PSL<N,Z,V> <- PSL<N,Z,V> (Unchanged) 
PSL<V> <- .NOT. S5 Condition Code <Z> 



CLR.VECT.RDY 



S3 clear of VECTOR RDY condition. See Section 8.5.18 



Mnemonic 



MISC1 Field - Special Format Microword 
Description 



RETIRE.INSTRUCTION 

FLUSH. VIC 

FLUSH.BPC 

FOP.VALID 

FLUSH.PCQ 

CLR.STATE.5-4 

SET.STATE.3 

SET.STATE.4 

SET.STATE.5 



generate Ibox retire instruction signal in S5 

flush Ibox virtual instruction cache in S5 

flush Ibox branch prediction cache in S5 

Fbox operand on e%abusjb<31i0> and e%bbus_h<31iO> or both 

Flush PC queue in Ibox 

clear flags<5:4>; change effects next microword 
set fiag<3>; change effects next microword 
set flag<4>; change effects next microword 
set flag<5>; change effects next microword 
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Table 8-20 (Cont): 


Ebox Miscellaneous Operations 




MISC2 Field - Special Format Microword 


Mnemonic 


Description 


F.DEST.CHECK 


Access destination queue and make entry in Fbox destination scoreboard 


FLUSH.PAQ 


Flush PA queue in Mboz 




MRQ Field • Both Standard and Special Format Microwords 


Mnemonic 


Description 


SYNC.BDISP 


stall if branch displacement invalid in S4; microtrap if fault 


SYNC.BDISP.RETIRE 


stall if branch displacement invalid in S4; microtrap if fault; S5 retire entry 


SYNC.BDISP.TEST.PRED stall if branch displacement invalid in S4; microtrap if fault; S5 microtrap if 

mispredict and retire entry 


LOAD.PC 


load new PC (always followed by MISC/RESTART.IBOX) 




DISABLE.RETERE Field - Special Format Microword 


Mnemonic 


Description 



YES Disable the retire macroinstruction and retire retire queue entry effects of 

SEQ.MUX/LAST.CYCLE and SEQ.MUX/LAST.CYCLE.OVERFLOW 

NO Enable the retire macroinstruction and retire retire queue entry effects of 

SEQ.MUX/LAST.CYCLE and SEQ.MUX/LAST.CYCLE.OVERFLOW 



The MISCl/RETTREJNSTRUCTTON function signals the Ibox to retire an instruction in order to 
bring the backup PC queue and the RLOG into the correct state for restoring GPRs and 
providing the backup PC after a microtrap. It does not retire a retire queue entry. Therefore 
MISC 17RETIRE .INSTRUCTION must always be followed by a MISC/RESET.CPU before the next 
macroinstruction execution dispatch (via SEQ.MUX/LAST.CYCLE). 

The MISC/RESET.CPU function causes E%STOP_EBOX_H to be asserted in S5 and E%FLUSH_MB05LH, 
E_MSC%FLUSH_EBOX^H, and E%FLUSH_FBOX w H to be asserted in S6. 

8.5.22 Ebox IPRs 

The Ebox implements two IPRs. They are IPRs 124-125 (decimal), PCSCR and ECR. 

ECR is a possible source of E_BUS%ABUS_L<31:0>, accessed by specifying ECR in the A field of the 
microword. ECR and PCSCR are also possible destinations of EJBUS%WBUS_L<31:0>, written by 
specifying PCSCR or ECR in the DST field of the microword. On writes, the entire register is 
written, regardless of the current DL value. 
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31 30 29 28127 26 25 24|23 22 21 20 ] 19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — +--+ — +--+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +— + — + — + — + — + — + — + 

|0|0|0| I I 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| I I I I I 0| 0| 0| 0| 0| 01 0| 0| :PCSCP 



1 1 

I + — NONSTANDARD PATCH 


1 1 1 1 1 
1 1 1 1 1 


+ — PATCH REV 


1 1 1 1 1 




DATA — + 1 1 I I 




RWL_SHIFT — + | | | 




PCS_WRITE — + | | 




PCS ENB — + | 



8.5.22.1 IPR 7C (hex), Patchable Control Store Control Register 

The PCSCR is used to load control store patches. Chapter 9 describes the patchable control store 
function in detail. Figure 8—9 and Table 8—21 show the bit fields and give descriptions. 
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Table 8-21 : PCSCR Field Descriptions 



Name 



Extent Type Description 



PARPORTDIS 



PCS.ENB 
PCS WRITE 



RWL SHIFT 



DATA 



NONSTANDARD PATCH 1 23 



PATCH REV 1 



8 RW,0 Writing a 1 disables control by the testability parallel port of 

the section of the internal scan used in loading the control 
store CAM (content addressable memory) and RAM. This is 
necessary when using this register to load the control store 
CAM and RAM 

9 RW,0 Enables the control store CAM and RAM so that patches are 

fetched and supersede the control store ROM. 

10 WO The event of writing a 1 to this bit causes the PCS scan chain 

contents to be written into the control store CAM and RAM. 
The control signal which enables the write returns to the 
inactive state automatically; there is no need for software to 
write a 0 to this bit after writing a 1. This bit always reads 
as 0. 

11 WO The event of writing a 1 to this bit causes the PCS scan chain 

to shift by one. The control signal which enables the shift 
returns to the inactive state automatically; there is no need 
for software to write a 0 to this bit after writing a 1. This bit 
always reads as 0. 

12 RW,0 This bit holds the data which is shifted into the PCS scan 

chain when a 1 is written to RWL_SHIFT. By repeatedly 
setting DATA and writing a 1 to RWL_SHDJT, software can 
shift any data pattern into the PCS scan chain. 

RW This bit is set by software after loading a microcode patch. If 
it is 1, it indicates a non-standard microcode patch has been 
loaded. This bit is returned as bit <8> in a read from the SID 
processor register, except that 0 is substituted for this bit in 
microcode for a SID read if PCSCR<PCS_ENB> is 0. 

28:24 RW This field is set by software after loading a microcode patch. 

It indicates the revision of the standard microcode patch 
which has been loaded. This field is returned as bits <13:9> 
in a read from the SID processor register, except that 0 
is substituted for this field in microcode for a SID read if 
PCSCR<PCS ENB> is 0. 



1 This bit or field not implemented in pass 1 chips. 



8.5.22.2 I PR 7D (hex), Ebox Control Register 

The ECR is used to configure certain Ebox functions. Figure 8-10 and Table 8—22 show the bit 
fields and give descriptions. 
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31 30 29 28127 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 
+— + — + — + — + — + — +--+--+ — + — + — + — + — +--+ — + — +—+—+—+ — + — + — + — +— + — + — + — +— + — +--+—+ — + 

| I 0| 0| 0| 0| 0| 0| 0| 0| | I I I 0| 0| I 0| 0| 0| 0| 0| I I I I I I I I :ECR 



1 

+-- PMF_CLEAR 


I 1 1 1 1 
1 1 1 1 1 






PMF_LFSR — + III 1 






PMFJEMUX — + I | | 






PMF PMUX --+ | | 






PMF_ENABLE --+ | 






FBOX_TEST_ENABLE — + 








ICCS EXT — + I 1 I 1 1 1 1 






TIMEOUT_CLOCK — + 1 1 1 1 1 1 






T IMEOUT_TE ST — + I I I 1 1 






TIMEOUT OCCURRED — + I I | I 






FBOX_ST4_BYPASS_ENABLE — + | | | 






TIMEOUT_EXT — + | | 



FBOX_ENABLE — + | 
VECTOR PRESENT — + 
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Table 8-22: ECR Field Descriptions 



Name 



Extent Type Description 



VECTOR PRESENT 



0 



FBOX.ENABLE 1 

TIMEOUTJEXT 2 

FB0X_ST4_BYPASS_ 3 
ENABLE 

TTMEOUT.OCCURRED 4 

TIMEOUT TEST 5 



TIMEOUT CLOCK 



ICCS EXT 



FBOX_TEST_ENABLE 13 RW,0 

PMF_ENABLE 
PMF.PMUX 

PMF_EMUX 
PMF LFSR 



RW,0 This bit is for vector unit support in a future version of this 
chip. 

RW,0 This bit is set to a 1 by configuration code to enable the Fbox. 

RW,0 This bit is set to a 1 by configuration code to select an external 
timebase for the S3 stall timeout timer. 

RW,0 This bit is set to a 1 by configuration code to enable Fbox 
Stage 4 bypass. 

WC This bit indicates that an S3 stall timeout occurred. Writing 
it with a 1 clears it. 

RW,0 If this bit is a 1, the S3 stall timeout circuit counts cycles 
instead of cycles in which e%timeout_enable_h is asserted. 
In this test mode the S3 stall timeout time is roughly 50 
microseconds instead of roughly 3 seconds. 

RO This bit is most significant bit of the timeout base counter. It 
is used as an indication that k%timkout_knablk _h is functioning 
(though some logic is not covered by this test). It should be 
1 half of the time and 0 the other half of the time. The 
period of the oscillation is 65536 time the cycle time of the 
chip or of the waveform on P%OSC_TCl_H, depending on 
ECR<TIMEOUT_EXT>. For ECR<TIMEOUT_EXT> set to 
0 and a 14 nsec cycle time, this is a period of roughly 900 
microseconds. 

RW,0 This bit is set by configuration code to select the interval 
timer mode. When it is 0, the CPU implements a subset 
interval timer with ICCS<6> maintained on the chip. When 
set to 1, the CPU implements a full interval timer with ICCS, 
NICR, and ICR processor registers implemented off chip. See 
Chapter 10. 

When this bit is set to a 1, e<*fbox_tkst_knb_h is asserted. This 
puts the Fbox is a test mode in which data is passed from 
stage to stage unaltered. 

16 RW,0 This bit is the internal implementation of the PME processor 
register. See Section 18.2.4 for more detail. 

18:17 RW,0 This field selects the source of the events counted by the 
performance monitoring facility, when enabled, to be Ibox, 
Ebox, Mbox, or Cbox. See Section 18.2.3 for more detail. 

21:19 RW,0 This field selects the Ebox events counted by the performance 
monitoring facility, when the performance monitoring facility 
is configured to count Ebox events. See Table 18-3 for more 
detail. 

22 RW,0 This bit enables the e%wbus_h<31io> LFSR (linear feedback 

shift register) accumulator. This is a testability feature. See 
Section 8.5.26.2 for more detail. 
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Table 8-22 (Cont.): ECR Held Descriptions 



Name 


Extent Type 


Description 


PMF.CLEAR 


31 WO 


Writing a 1 to this bit clears the performance monitoring 
facility counters (which are also the e%wbus_h<31io> LFSR 
accumulator). It is not implemented in hardware. Microcode 
handles this function. 



8.5.23 Initialization 

The main mechanism for Ebox initialization is the power-up microtrap, and the MISC/RESET.CPU 
which occurs in the first microword of this microtrap flow. When this trap occurs, the 
Microsequencer will assert E_USQ%PE_ABORT_L, aborting the Ebox pipeline as it does for any 
microtrap. None of the registers in the register file or elsewhere in the Ebox are cleared on 
initialization, except that IPR bits are cleared where indicated by the bit type (see Section 8.5.22). 
The state flags are also cleared by reset. 

The Ebox asserts E%STOP_ffiOX_H, E_MSC%FLUSH_EBOX_H, E%FLUSH_MBOX_H, 
and E%FLUSH_FBOX_H during reset. This is the same effect as MISC/RESET.CPU. See the sections 
on initialization for each of the boxes for more detail. 

8.5.24 Timing 

TBS. A timing diagram for major Ebox signals will someday appear here. 

8.5.25 Error Detection 

Ebox handling of memory management faults and hardware errors detected by the Mbox while 
processing an Ebox or Ibox request is covered in Section 8.5.19 and Section 8.5.17. 

8.5.25.1 S3 Stall Timeout 

The Ebox implements an S3 stall timeout timer. The timeout time is shown in Table 8-23. 

Figure 8-11 shows all the NVAX timeout timers, including those implemented in the Cbox. The 
Cbox timeout timers are shown because they use E%TIMEOUT_BASE_H as their timebase. See 
Section 13.4.3.4 for more detail on the Cbox timeout timers. 

The timeout timer input is E%TIMEOUT_BASE_H, which is created internally by dividing the CPU 
clock by 65536. As an alterative in systems in which require longer timeout times than NVAX 
implements, this timer can use an externally supplied timebase. To select the external timebase, 
K%EXT_TMBS_H, ECR<TEMEOUT_EXT> is set to 1. In this case the base counter counts cycles of 
K%EXT_TMBS_H instead of the NVAX CPU internal clock. K%EXT_TMBS_H is a synchronized version 
of the signal received on pin P%OSC_TCl_H. Note that P%OSC_TCl_H is synchronized in the 
clock section to NDAL clocks and must therefore be driven with a clock signal which is high for 
longer than one NDAL cycle and low for longer than one NDAL cycle. For a square wave clock 
waveform this implies a speed of 11.9 MHz or less. 
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Table 8-23: S3 Stall Timeout Values in Normal Mode 



Cycle time 



Timeout 
Granularity 



S3 Stall timeout 



10-ns NVAX 
12-ns NVAX 
14-ns NVAX 



655 microseconds 
786 microseconds 
917 microseconds 



2.6837 (min) to 2.68345 (max) seconds 
3.22044 (min) to 3.22123 (max) seconds 
3.75718 (min) to 3.7581 (max) seconds 



Figure 8-11 : NVAX Timeout Counters 
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ECR<TIMEOUT EXT> 




t> 



CARRY OUT 



E%TIMEOUT BASE H 
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E%TIMEOUT ENABLE H 



ECR<TIMEOUT_TEST> 



E FLT%S3 TIMEOUT STALL H 
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EBOX COUNTER 
12 BITS 



E_TIM%S3_TIMEOUT_H 
TIMEOUT H 
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E%TIMEOUT_BASE_H 
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In every cycle the Ebox counter increments, if one of the following is true: 

• S3 is stalled, or 

• The microword in S3 is the STALL microword (as determined by the piped version of 
E_USQ%IQ_STALL_H sent from the Microsequencer). 

• The field queue is being accessed via a microcode conditional branch and is empty 
(E_FLQ%FQ_STALL_H is asserted). 

These conditions are accumulated into one condition, E_FLT%S3_TIMEOUT_STALL_H, in the fault 
logic section of the Ebox. If none of the above conditions is true, the Ebox counter is reset to 0. 
If the counter reaches its maximum value and overflows, an immediate asynchronous hardware 
error microtrap is forced. The microtrap breaks the Ebox stall by aborting the pipeline. 

When the S3 stall timeout timer overflows, forcing a microtrap, the signal E%S3_TTMEOUT_H is 
asserted for one cycle. This causes the chip reset logic to reset the Mbox and Cbox. Microcode, 
in handling the asynchronous hardware error microtrap, must also do MISC/RESET.CPU in order to 
properly reset the Mbox. 

The Ebox timeout counter treats cycles in which the pipeline advances the STALL microword 
into S3 as an S3 stall cycle. If the Microsequencer sends STALL microwords into the pipeline 
continuously, the timer will eventually timeout. This is the case when the instruction queue in 
the Microsequencer remains empty forever. 

Similarly, if microcode is in an infinite loop, conditionally branching on the field queue contents, 
an S3 stall timeout will occur. 

Any true S3 or S4 stall which lasts forever will cause an S3 stall timeout. It is expected that 
some hardware failures within the NVAX CPU could cause the Ebox to get out of sync with the 
Ibox, Ebox, or Fbox. This could result in the Ebox waiting forever for an event which will never 
happen. This timeout timer causes a machine check exception to occur instead of allowing the 
CPU to simply hang. 

8.5.25.1 .1 Testing the S3 Stall Timeout Timer 

The Ebox timeout counter may be configured for testing by writing a 1 to ECR<S3_TIMEOUT_TEST>. 
When this bit is 1, the Ebox counter counts NVAX CPU internal clock cycles instead of cycles of 
E%TIMEOUT_BASE_H . Table 8-24 gives the timeout times in test mode. See the timeout counter 
test discussion in Section 13.4.3.4 for detail on how to cause a timeout for test purposes. The 
timeout will cause the asynchronous hardware error machine check (see Chapter 15). 



Table 8-24: S3 Stall Timeout Values in Test Mode 





Timeout 




Cycle time 


Granularity 


S3 Stall timeout 


10-ns NVAX 


10 nanoseconds 


40.95 (min) to 40.96 (max) microseconds 


12-ns NVAX 


12 nanoseconds 


49.14 (min) to 49.152 (max) microseconds 


14-ns NVAX 


14 nanoseconds 


57.33 (min) to 57.344 (max) microseconds 



DERIVATION OF TIMEOUT VALUES 
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The timeout values given above were derived as follows: 



Table 8-25: Derivation of NVAX Timeout Values 



NVAX 


Timeout 




mode 


Granularity 


S3 Stall timeout 




(in NVAX cycles) 


(in NVAX cycles) 


Normal 


2**16 


2**28—2**16 (min) to 2**28 (max) 


Test 


1 


2**12—1 (min) to 2**12 (max) 



8.5.26 Testability 

This section describes the testability features in the Ebox. 

8.5.26.1 Parallel Port Test Features 

The microaddress currently being used to access the control store is visible on the parallel port. 
Much information about Ebox execution can be inferred from the sequence of microaddresses 
seen on the parallel port. See Section 9.5. 

No other Ebox signal is visible directly at the parallel port. Quite a few are visible through the 
internal observability scan chain controlled via the parallel port controlled inputs. Table 8-26 
shows these signals. Timing information and a description is given for each signal. 

The scan chain loads input data in $4. If a signal is not ready to be latched in $4, it has to be 
delayed before being loaded into the scan chain. This implies that the particular signal's value 
sampled by the scan chain is from one cycle earlier than the cycle in which the scan chain was 
loaded. This is shown Table 8—26 in the timing column. 

Table 8-26 lists the scan chain data bits in the order in they would appear at the parallel port. 
The value of E_RGF%ERROR_H appears first and the value of F%STORE_STALLJB[ appears last. 



Table 8-26: Ebox Observe Scan Signals 



Schematic Signal 



T Sming 



Description 



K_RGF%KRROR_L 



EJBGEP%FADLTJL 



KfelMEM JtfEXC_H 



delayed 



delayed 



A 1 value means the Ebox is detecting a hardware 
error associated with the current MD read (including 
bypassed MD reads) or with a current S3 Ibox-to-Ebox 
queue access (instruction queue, source queue, or field 
queue). 

A 1 value means the Ebox is detecting a memory 
management fault associated with the current MD 
read (including bypassed MD reads) or with a current 
S3 Ibox-to-Ebox queue access (instruction queue, 
source queue, or field queue). 

A 0 value means the Ibox is signaling a memory 
management exception associated with one of the 
Ibox-to-Ebox queues (instruction queue, source queue, 
field queue, branch queue, or destination queue). 
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Table 8-26 (Cont): Ebox Observe Scan Signals 



Schematic Signal 



Timing 



Description 



F%1NFUT_STALL_H 
I%XMEMJHERR_H 

E_SDQ*DQ_STAJLL_L 

E _ J MEM<*>F_STORE_H 

E_RGF%IW_BYPASS_B_L 

E_RGF%BDATA 1 _VAIJD_L 

E_RGF%IW_BYPASS_A tJ L 

E_RGP%ADATA U .VAIJD_L 

E_SDQ*SCOREBOABD_HIT_STALI 1 _H 



KJ3DQ«SQ_STALLJ3 
E_RTQ%RQ_SEL,_EBOX_STX,JL 

E_FLT%MME_ERR_H delayed 



delayed 
delayed 
delayed 
delayed 



E_FLT%HW_ERR_H 

K_SDQ%S4_FDEST_STAIJL_L 

VSS 

E_RGP%WNGPR_ERROR_H 
E_FLT%Q_FAULT_H 



delayed 



A 0 value means the Fbox is currently requesting that 
no more input data be sent. 

A 0 value means the Ibox is signaling a hardware 
error associated with one of the Ibox-to-Ebox queues 
(instruction queue, source queue, field queue, branch 
queue, or destination queue). 

A 1 value means the destination queue is being 
accessed and there isn't a valid entry. 

A 0 value means the Fbox is requesting a store in this 
cycle. 

A 1 value means the Ibox register file write is being 
bypassed to E_BUS%BBUS_L. 

A 1 value means the data on e_bus%bbus _L is valid 
(otherwise a MD or WN stall would occur). 

A 1 value means the Ibox register file write is being 
bypassed to e_bus%abus_l. 

A 1 value means the data on e_bus<*abus_l is valid 
(otherwise a MD or WN stall would occur). 

A 0 value means the Fbox destination scoreboard in 
the destination queue has a hit (i.e., a current source 
queue based register file read is to a register the Fbox 
will update in the future). 

A 0 value means the current source queue read(s) is 
(are) accessing an empty location - one kind of S3 stall. 

A 1 value means the Ebox is next to retire an 
instruction, not the Fbox. 

A 0 value means the Ebox is signaling the 
microsequencer to initiate a memory management 
fault micro-trap. 

A 0 value means the Ebox is signaling the 
microsequencer to initiate a synchronous hardware 
error microtrap. 

A 1 value means the Ebox is stalled in S4 doing the 
FDEST.CHECK operation and the destination queue 
doesn't contain the necessary entry or entries. 

Always a 1 value. 

A 0 value means the Ebox is recognizing a hardware 
error because the Mbox wrote a working register or 
GPR while asserting m%bakd_kbr_h. 

A 0 value means the Ebox is detecting a memory 
management fault with a current S3 Ibox-to-Ebox 
queue access (instruction queue, source queue, or field 
queue). 
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Table 8-26 (Cont.): Ebox Observe Scan Signals 



Schematic Signal 



Timing 



Description 



EJFLT%Q_EHHOB^H 
F%CC _MAP_H<1> 

F%CC_MAP_H<0> 

F%KETTRE_H 
B_STL%LAT_PAQ__STLja 

E_STL%BQ_STAIija 

F%MKRR_H 

P%RSVD_ADDILJ10DEja 

F%MMGTJP"AULT_H 

F%RSV_H 
F%FOV_H 
F%FDBZ_H 

F%FCT_H 

MKEMJJtTJTOLIJS 
E%ERKFJKEQ_H 

vss 

E%STARTJBOX_IO_RD_H 



delayed 



A 0 value means the Ebox is detecting a hardware 
error with a current S3 Iboz-to-Ebox queue access 
(instruction queue, source queue, or field queue). 

A 0 value means the most significant hit of this field is 
a 1. See Table 8-6. This data is valid for the condition 
code alteration in the current cycle (S5), provided it is 
a Fbox instruction being retired. 

A 0 value means the least significant bit of this field is 
a 1. See Table 8-6. This data is valid for the condition 
code alteration in the current cycle (S5), provided it is 
a fbox instruction being retired. 

A 0 value means the Fbox is requesting an instruction 
retire in this cycle. 

A 0 value means the Ebox is stalling because the PA 
queue is not valid and the current destination queue 
access is requiring the use of the PA queue. 

A 0 value means the Ebox is 
stalling because the branch queue is empty and the 
current microinstruction in S4 accesses it. 

A 0 value means the Fbox is signaling a hardware error 
on one of the source operands for the currently retiring 
instruction. 

A 0 value means the Fbox is signaling a reserved 
address mode fault on one of the source operands for 
the currently retiring instruction. 

A 0 value means the Fbox is signaling a memory 
management fault on one of the source operands for 
the currently retiring instruction. 

A 0 value means the Fbox is signaling a reserved 
operand fault for the currently retiring instruction. 

A 0 value means the Fbox is signaling a floating 
overflow fault for the currently retiring instruction. 

A 0 value means the 

Fbox is signaling a floating divide-by-zero fault for the 
currently retiring instruction. 

A 0 value means the Fbox is signaling a floating 
underflow fault for the currently retiring instruction. 

A 0 value means the Mbox is signaling that the 
EM_LATCH is full. 

A 0 value means the Ebox is making an Mbox request 
in this current cycle. 

Always a 1 value. 

A 0 value means the Ebox is signaling the Mbox that 
an Ibox 10 space read may begin in the current cycle, 
subject to certain Mbox restrictions. 
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Table 8-26 (Cont.): 


Ebox Observe Scan Signals 




Schematic Signal 


Timing 


Description 


F%STORE_STXLL_H 




A 0 value means the Fbox aborted a store request late 
in the current cycle. 



8.5.26.2 E%WBUS_H<31 :0> LFSR 

E%WBUS_H<3 1 *>> (the buffered copy of E_BUS%WBUS_L<3 1 d0> which is driven to the Mbox) has 
an LFSR (linear feedback shift register) accumulator. The LFSR is implemented as part of the 
performance monitoring facility that is described in Chapter 18, and controlled by two bits in the 
ECR processor register: PMF.LFSR and PMF_CLEAR. 

The E%WBUS_H<3 1.-0> LFSR is implemented as two identical 16-bit LFSRs, one for E%WBUS_H<3l:ie> 
and one for E%WBUS_H<15:0>. A block diagram of one of these 16-bit LFSRs is shown in Figure 8-12. 
The reader should note that the output of the left-most bit in the LFSR chain is inverted before 
being XORed with earlier taps. This was done for implementation reasons. 

Figure 8-12: E%WBUS_H LFSR Block Diagram 




<i5<ii<25<xi<3i<3i<ii<n 



<31,16> <30,14> <29.13> <28,12> <27.11> <2S,10> c2S,0»> <24,08> <23,07> c22,06> <21,06> <20.04> c19,03> <18.02> <17.01> <16.00> 



ILK: WBUtLfM.DOC 



Both halves of the E%WBUS_H<31:0> LFSR may be cleared by software by writing a 1 to 
ECR<PMF_CLEAR> (which results in microcode executing the MISC/C LR.PERF. C OUNT function). 
The operation of the pair of LFSRs is started by software by writing a 1 to ECR<PMF_LFSR> 
and stopped by writing a 0 to the same bit. The current state of the E%WBUS_H<31:0> LFSR may 
be read by software via the PMFCNT processor register (an E_BUS%ABUS_L<31H>> source available 
via MFPR) in the format shown in Figure 8-13. 
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Figure 8-13: PMFCNT Processor Register In E%WBUS_H<31 :0> LFSR Format 



31 30 29 28127 26 25 24123 22 21 20|19 18 17 16 | 15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 



+■ 



■+ 



E WBOS<31:16> LFSR Value 



E WBUS<15:00> LFSR Value 



I : PMFCNT 



+■ 



•+ 



CAUTION 



The E%WBUS_H<S1H)> LFSR hardware also provides the performance monitoring facility 
function under control of ECR<PMF_ENABLE> . The operation of the hardware is 
UNDEFINED if both ECR<PMF_ENABLE> and ECR<PMF_LFSR> are on, or if software uses a 
single MTPR write to turn off one bit and turn on the other simultaneously. That is, if 
either bit is on, software must turn off both bits with one MTPR and turn on the other 
with a second MTPR. 



8.5.27 Microcode Restrictions 

This section gives microcode restrictions due to Ebox microarchitecture and the VAX architecture. 

8.5.27.1 Register Access Restriction 

The first microword of any execution microflow must not read GPRs explicitly, and an explicit read 
must be preceded by at least one microword specifying something other than NONE in the DST field. 
(A/Sl, A/S2, B/Sl, and B/S2 are always allowed.) This restriction has to do with the fact that the Fbox 
destination scoreboard only examines the source queue outputs to detect GPR read-before-write 
hazards. Therefore it specifically does not apply in a microtrap flow since the Fbox can never 
write a result after a microtrap. 

8.5.27.2 FLUSH. PAQ Restriction 

MISC2/FLUSH.PAQ should only be specified when the MRQ field specifies an Mbox operation 
which is sent in the EM latch (i.e., other than MRQ/SYNC.BDISP, MRQ/SYNC-BDISPRETIRE, 
MRQ/SYNCJBDISP.TEST.PRED, or MRQ/NOP). Otherwise the Mbox will not flush the PA queue. 

8.5.27.3 Memory access restrictions 

Microcode must ensure that all accesses from the current microflow are complete before allowing 
the microsequencer to dispatch to the next microflow. 

Destination queue indirect writes (DST/DST) may be implicit memory operations. The MRQ field 
must specify NOP^YNCBDISP^YNCBDISPREHRE, or SYNC.BDISP.TEST.PRED when this operation is 
specified. 

8.5.27.4 Shifter Restrictions 

If the shifter uses the SC register as the source of the shift amount, the SC must have been 
loaded from E_BUS%ABUS_L<4:0> by the previous microword or from E_BUS%WBUS_L<4K)> by the 
microword before that. Otherwise the old SC value is used as the shift amount. 
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8.5.27.5 SHIFT.SIGN Restriction 

The saved copy of the shifter sign bit (Saved-SHF<N>) is UNPREDICTABLE after executing a special 
format microword. 

8.5.27.6 MMGT.MODE Restrictions 

The MMGT.MODE register must be loaded before (in a microword preceding) the microword 
specifying a memory management probe in the MRQ field. 

8.5.27.7 MPU Restrictions 

If the MPU mask value is loaded by microword N specifying MISC/LOAD.MPU.FROM.B, microcode 
may not branch on the new value until microword N+2. Microcode may branch on the old value 
in N and N+l. 

8.5.27.8 Microbranch Condition Restrictions 

The first microword of a macroinstruction execution microflow should not branch based on the 
state flags. (It may set or clear them.) 

8.5.27.9 ibox IPR read restriction 

Microcode should not use GPRs as the target for read type accesses to Ibox IPRs. There is no 
synchronization mechanism to determine when the result is ready. Also, the control logic in the 
Ibox IPR assumes a working register is the destination. 

8.5.27.10 RET1RE.INSTRUCTION 

The MISCl field operation, RETTRE.INSTRUCTION must always be followed by a MISC/RESET.CPU. 
The MISC/RESET.CPU may come any number of cycles later, but must come before the next 
macroinstruction microflow is dispatched. 

8.5.27.11 VAX Restart Bit Restriction 

The VAX Restart Bit should not be read until two microwords after the last microword whose 
effect is expected to be reflected in the bit's state. For example, the machine check microflow 
should wait until the second microword before reading the bit to put it on the stack. Then the 
bit will reflect the state for the aborted execution microflow. 

8.5.27.12 Q Register Interaction With SMUL.STEP and UDIV.STEP 

In the microword after the last ALU/SMUL.STEP or ALU/UDIV.STEP, the Q register should not be 
sourced to E_BUS%ABUS_L<3 1 K)> or E_BUS%BBUS_L<31K)>. Bypassing is not implemented for this 
kind of Q register update. 

The microword before an ALU/SMUL.STEP must not update the Q register (Q/UPDATE.Q) unless that 
microword also specifies ALU/SMUL.STEP . 
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8.5.27.1 3 UDIV/SMUL Restrictions 

The Q field must specify Q/UPDATE.Q if the ALU field functions SMUL.STEP or UDIV.STEP are specified. 
Also the ALU result must be specified as the source of E_BUS%WBUS_L<3l.-o>, and the shifter 
operation must be NOP. 

8.5.27.14 F.DEST.CHECK Restrictions 

The P.DEST.CHECK miscellaneous operation should only be used as intended. It should be specified 
in the last microword of a micron' ow which sends operands to the Fbox. It should never be specified 
in a microword which also specifies DST in the DST field. 

8.5.27.15 Fbox Operand Delivery Restriction 

IN delivering operands to the Fbox microcode may only not use A/S2 or B/Sl. Short literal bypass 
to the Fbox source operand buses is not implemented for these decodes. Use of these decodes 
for Fbox operands could cause improper input data formatting in the Fbox if a short literal data 
item is present in the source queue. 

8.5.27.1 6 RMUX control Restrictions 

Every microword with an S4 or S5 side effect of modifying any state (examples include 
SYNC.BDISRRETIRE, RESET. CPU, and LOAD.PSL.CC.XXXX) must specify a DST other than NONE. A DST 
of WBUS is acceptable. This restriction specifically does not apply to FDEST.CHECK. 

Every microword specifying any operation other than NOP in the MRQ field must specify a DST 
other than NONE. A DST of WBUS is acceptable. 

8.5.27.17 Control Bits 

After changing either of ECR<1 or 3> (FBOX.ENABLE or FBOX_ST4JBYPASS_ENABLE) microcode should 
not do a SEQ.MUX/LAST.CYCLE or SEQ.MUX/TLAST.CYCLE.OVERFLOW in the three microwords following 
the one altering the control bit. 

8.5.27.18 Mlcrotrap Dispatch and RESET.CPU Restrictions 

8.5.27.18.1 Mlcrotrap Flows 

In a microtrap handler for any microtrap except branch mispredict, microcode must do a 
MISC/RESET.CPU before it can read any of the registers in the register file which has a valid bit. 
This restriction is necessary to avoid deadlock. Specifically, microcode must not source any Wn 
register (working register) until the microword after the one which specifies MISC/RESET. CPU. 

In a microtrap handler for any microtrap except branch mispredict, there should be no memory 
request until the third microword after the one specifying MISC/RESET.CPU. 

In a microtrap handler for any microtrap except branch mispredict, any microcode operation 
which causes an entry in the retire queue to be retired is illegal until a MISC/RESET.CPU is executed 
and a second microword specifying SEQ.MUX/LAST.CYCLE and DISABLE.RETTRE/YES is executed. 
This second microword must not occur until after the third microword after the one specifying 
MISC/RESET.CPU. 



DIGITAL CONFIDENTIAL 



TheEbox 8-93 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



In a microtrap handler for any microtrap except branch mispredict, UNPREDICTABLE or UNDEFINED 
results could occur if microcode accesses the source queue, destination queue, instruction 
queue, branch queue, or field queue until the fourth microword after the one which specifies 
MISC/RESET.CPU. Similarly, UNPREDICTABLE results could occur if microcode reads from the Wn or 
GPR register before the fourth microword after the one which specifies MISC/RESET.CPU, or writes 
to these registers before the second microword after the one which specifies MISC/RESET.CPU. 

8.5.27.18.2 MISC/RESET.CPU Restrictions 

The fourth microword after one specifying MISC/RESET.CPU may specify SEQ.MUX/LAST.CYCLE 
(with DISABLE.RETIRE/YES), but the first three must not. The first three microwords after a 
MISC/RESET.CPU must not access the source queue or field queue. The first two microwords after 
a MISC/RESET. CPU must not access the destination queue, branch queue. 

The first two microwords after a MISC/RESET. CPU must not issue memory requests. 

After a microword specifying MISC/RESET.CPU, any microcode operation which causes an entry 
in the retire queue to be retired is illegal until a microword specifying SEQ.MUX/LAST.CYCLE and 
DISABLE.RETIRE/YES is executed. This microword must not occur until after the third microword 
after the one specifying MISC/RESET.CPU. 

8.5.27.18.3 Asynchronous Hardware Error Microtrap Restriction 

There are two possible causes of this microtrap, TB parity error and S3 stall timeout. If the cause 
is S3 stall timeout then the Mbox and Cbox are reset by Ebox hardware for 17.5 cycles. Microcode 
must not issue any memory requests during that reset time period. Also, the Mbox requires that 
the MISC/RESET.CPU function be done during the reset period. The first microword of the microtrap 
handler does not reach S6 until 5 cycles after the S3 stall timeout is detected. Hence the earliest 
the effect of MISC/RESET.CPU on the Mbox can occur is 5 cycles into the 17.5 cycle reset period. 
Microcode currently issues the MISC/RESET.CPU upon entry to the asynchronous hardware error 
microtrap (regardless of the cause) and then waits 23 cycles before beginning normal exception 
handling procedures. This is the recommended procedure. 

8.5.27.18.4 First Part Done Dispatch Restriction 

The microcode flow at the dispatch for PSL<FPD> set must determine if the opcode is that of an 
Fbox instruction. If it is, then a MISC/RESET.CPU must occur before the next SEQ.MUX/LAST.CYCLE 
or SEQ.MUX/LAST.CYCLE.OVERFLOW. This case results in the Fbox and Ebox being out of synch in 
the protocol for sending opcodes and operands. The Fbox must be flushed. If the instruction is not 
an Fbox instruction, microcode may continue without the MISC/RESET.CPU (as it does in the case 
of unpacking and continuing the execution of an interrupted string instruction such as MOVC3). 

8.5.27.1 9 PSL Use Restrictions 

The PSL must not be loaded in the first microword of a macroinstruction execution microflow. 

The first two microwords of any macroinstruction execution microflow (any opcode dispatch or the 
FPD dispatch) should not use the PSL as a source. The PSL<TP> bit read onto E_BUS%ABUS_L<31:0> 
will not necessarily be correct. Microcode may disregard this restriction if it is acceptable for this 
bit to be incorrect. (Reading the PSL does not prevent the automatic copy of <T> to <TP>.) 
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The PSL should not be read in the microword after it is updated. If this rule were not followed, 
it is UNPREDICTABLE whether the second microword will source the old or the new PSL value. 
(Actually it depends on whether an S3 stall occurs on the second microword.) 

On loading a new PSL, the third microword after the one altering the PSL may specify LAST.CYCLE 
for a decode dispatch, but the first two may not. If it is known that the PSLkFPD, T, or TP> bits 
will not change, then this restriction does not apply. 

On loading a new value to PSL<FU>, the microword after the one altering the PSL may specify 
SEQ JMUX/LAST.CY CLE for a decode dispatch, but the one which altered the PSL must not. 

If microcode loads a new value to PSL<IPL> in microword N, then microwords N through N+3 
must not specify SEQ.MUX/LAST.CYCLE or SEQJ^DC/LAST.CYCLE.OVERFlX)W, but N+4 may. 

After changing the PSL microcode generally should not micro-branch on PSL bits in the nest two 
microwords. Assuming microword N updates the PSL, if microwords N or N+l branch on the 
PSL the old PSL value will determine the result of the microbranch. However, if microword N+2 
branches on the PSL, it is UNPREDICTABLE whether the old or new PSL bits will be used to determine 
the branch outcome. (Actually, it is predictable if S3 stalls on microword N+l are known.) If N+3 
branches on the PSL, the new PSL value will definitely determine the result of the microbranch. 
This restriction specifically does not apply if PSL<29,26:22> are not changed by the load. 

Many microcode flows alter the condition code bits, PSL<3:0>, in the last cycle of the flow. This 
implies that microcode should not source the PSL in the first microwords of any flow except 
microtrap flows (i.e., don't in these flows: opcode dispatch, FPD dispatch, trace fault dispatch, or 
interrupt dispatch) unless it is acceptable that the incorrect value might be read for the condition 
code bits. (This assumes that the first microword of the flow synchronizes to any outstanding 
Fbox retire by specifying a DST other than NONE.) 

Certain restrictions accompany changes to PSL<CUR_MOD>. The Mbox must not be processing 
any Ebox references or operand prefetches while PSL<CXJR_MOD> is being changed. The microword 
after the one changing PSL<CUR_MOD> can issue a memory reference which will be access checked 
using the new PSL<CUR_MOD> value. 

There are no restrictions on reading or writing the PSL in beginning of a microtrap flow. The 
Ebox pipeline has been flushed before the microtrap flow begins, so there can't be updates to the 
PSL after this microflow starts. 

The following table summarizes PSL restrictions at beginnings and ends of flows. 
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Table 8-27: PSL Restrictions Summary 


PSL Bits 


At beginning of new flow 1 


Before end of any flow 2 


PSL<3:0>; 
PSL<N,Z,V,C> S 


1* 


0 


PSL<4,27>; 
PSL<T,FPD> 


0 


3 


PSL<6>; PSL<FU> 


0 


1 


PSL<20:16>; 
PSL<IPL> 


0 


4 


PSL<30>; 
PSL<TP> 


2 


3 


PSL<any other> 


1 


0 



1 Number of microwords required at beginning of microflow before microword in which these bits are read. Applies to 
macroinstruction execution flows (including FPD dispatches), and to trace fault and interrupt dispatches, but not to 
microtrap dispatches. 



2 Number of microwords after one which alters these bits (before and including the one which specifies 
SEQ.MUX/LAST.CYCLE or SEQ.MUX^AST.CYCLE.OVERFLOW) 

3 This assumes the microcode convention of altering the psl condition code bits in the last microword of some execution 
flows. 

4 This assumes that the first microword of the flow synchronizes to any outstanding Fbox retire. 



8.5.27.20 S+PSW Restrictions 

The PSL is written in S5 while the S+PSW source is read in S3. If microword N updates the PSL, 
microword N+l should not source S+PSW. It is UNPREDICTABLE whether the old or new value would 
be sourced if this restriction were not obeyed. 

8.5.27.21 RN.MODE.OPCODE Restrictions 

For the RN field to be valid, the A field of the microword must specify Si (the current source queue 
entry), and the microcode must know from context that the source queue entry points to a GPR. 
If these restrictions are not met, the value returned in the RN field is UNPREDICTABLE. 

The PSL is written in S5 while the RN.MODE.OPCODE source is read in S3. If microword N updates 
the PSL, microword N+l must not source the new value of RN.MODE. OPCODE . It could receive the 
old value. 
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8.5.28 Signal Name Cross-Reference 

The following table gives a cross reference for selected signal names in this chapter. Only signal 
names which have different names in this chapter than they do on the schematics are listed. 
Different names are used in this chapter only where the resulting description is significantly 
clearer. 



Table 8-28: Signal Name Cross-Reference 



Name in this chapter 



Name on schematic* 



Name in behavioral model 



K^ALU%BKSULT_H<aia5> 
EJU.U%BJEStJIirjB<l*D> 
K ^ALU%CIL_H<S3> 
K^ALtJ*CI_H<31^*7^B, 

22,20, 18,16> 

K >LU%CI w H<14,ia,10A8,O> 
E J*lLU%C1^<1S,UA7AS,1> 
E_ALU%CI w H<0> 
E%KMUX_S4^FLU8H 

K%HMUX_S4_8TAIX 



E%RMUX W S6 JFLTJBH 



B%8Sjn.USH 



E%SS_STAIJL 



E*S4_STAIX 



E_ / ALU - AD2*RJ>81«18> 

E_>LU>m*RJL<14<0> 

E_ALUAD2*CIL.L«32> 

E_ALU J ADSf»C^L<81^B^7^5, 
23^1,19,17 t 15> 

E^AL,U>M»CIJB<S<V»^M*. 
22^0,18,16> 

E_>a.u^ADi«cija<i4,ia,i0A«,O> 

E_ALU J ADl*CIJL<ia,llA7,5Al> 



no exact match, roughly equals 
the following: 
E_8TL«VERY JLATBLNOP_RMUX_S4_J. 

no exact match, roughly equals 
the following: 
E_8TL%8TALI^RMUX_84JL ) 
E_STLfla^rE_STALL t _BMUX_S< J L, 
E_STl.%VEaY_^TE_SXAlX u RMUX_84J. 

no exact match, 

roughly equals the following: 

E_STLWOP_HMCX_S8J-, 
E_8TLW_NOP_BMlOL88JH 

no exact 
match, roughly equals the 
following: ejdsq*pe>bortjl, 

E_STF«H»K_ABORT_^, 
E_STL%*>PE_ABORT_H 

no exact 
match, roughly equals the 
following: E_8TL%8TXiJL_sa_L, 

E_ST1.%LATE_STALL 1 _88JL, 
E_8TX%VKKY W LATE_8TALXJS3 JL 

no exact 
match, roughly equals the 

following: E_STL*STAIX_S<_L, 
E_STI/H,ATE_STALL_S4JL 



E_ASEL>LU%HESUL,Tja<Sl>15> 

E^ASHJVLU%RK8ULTJH<14«0> 

E_ASBLALU'%CAHaJDKS_OUT_H<Sl> 

E _>SHJUA?%CARRIKS_OUT_H<a0^8^34, 
22^0,18,ie,14> 

E_ASH w ALU*CABBEK8_OOT_H-<aft^7^8, 
23^1,1»,17,18> 

E^a3BLALU%CABHIK8_OUT_H<lS,ll^,7A»4> 
E ^ASH^ALU%CARBIE8_OUT_H<a2,10A«,*^> 
E.>SHJ*LU%CJN_H 

no exact match, roughly equals the following: 
E_STL%LATE_PJ«)P_HMUX_84_H, 
E_8TL%VEHT w LATKJ*OP_BMDX_84 _JH 

no exact match, roughly equals the following: 

E_STL%8TAlX L JBMro3LS4_H, 
E_STL%LATE_STAIJUJtMUX_84 
E_STL%VEHT_JATK_8TALXJIMUX W S4_H 

no exact match, roughly equals the following: 

E_S*rL%FJTOP_BMUX_S*_H, 
E_8TL*IATK W F_>»P_KMUX W 8«_H, 
E_SrrL«NOPJBBTOX.SBJS 

no exact match, roughly equals the following: 

E_US<l%P1I_ABORTja 



no exact match, roughly equals the following: 
E_8TL%8TAULJ3S_H, E_STLWLA3S.S1VULL U 83^, 
E_8Tl.*VKRYJLATE_8TAm_8SJt 



no exact match, roughly equals the following: 
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Tahfo 8—28 (C'ant Y 








Name in this chapter 


Name on schematics 


Name in behavioral model 




E%S4_FLUSH 


no exact 

YrtfltrT'i T'AticVhlv miuaIr fiViA 

following: e_stl%p_nop_s<.h, 

K_STL%LATK_F_NOP_84_H 


no exact match, roughly equals the following: 

K_STL%F_NOP_S4^H, K_SH.%LArE_F J<OP_S4_H 


E%88_FLUSH 


no exact match, 
roughly equals the following: 

E_STLWOP_85_L, 
E_STL%F_NOP_S6_H 


no exact match, roughly equals 

E_STL%NOP_S6_H, E_STT/*>FJ>*OP_S6_H 


the following: 
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8.5.29 Revision History 
Table 8-29: Revision History 



Who When Description of change 



John Edmondson 


30-NOV-1988 


Initial Release. 


John Edmondson 


19-DEC-1988 


Corrections and Updates. 


John Edmondson 


06-MAR-1989 


Release for external review. 


John Edmondson 


29-NOV-1989 


Updates after external review and modeling complete. 


John Edmondson 


18-DEC-1989 


Further updates, particularly adding real signal names. 


John Edmondson 


31-JAN-1990 


Updates reflecting minor implementation motivated changes 
- rev 0.5. 


John Edmondson 


4-MAY-1990 


Updates reflecting minor implementation motivated changes 






- post rev 0.5. 


John Edmondson 


20-FEB-1991 


Further updates post implementation. 


John Edmondson 


31-MAY-1991 


Minor updates for pass 2 changes. 
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Chapter 9 

The Microsequencer 



9.1 Overview 

The microsequencer is a microprogrammed finite state machine that controls the three Ebox 
sections of the NVAX pipeline: S3, S4, and S5. The microsequencer itself resides in the S2 section 
of the pipeline. It accesses microcode contained in an on-chip control ROM, and microcode patches 
contained in an on-chip SRAM. Each microword is made up of fields that control all three pipeline 
stages. A complete microword is issued to S3 each cycle, and the appropriate microword decodes 
are pipelined forward to S4 and S5 under Ebox control. 

Each microword contains a microsequencer control field that specifies the next microinstruction 
in the microfiow. This field may specify an explicit address contained in the microword or direct 
the microsequencer to accept an address from another source. It also allows the microcode to 
conditionally branch on various NVAX states. 

Frequently used microcode can be made into microsubroutines. When a microsubroutine is called, 
the return address is pushed onto the microstack. Up to six levels of subroutine nesting are 
possible. 

Stalls, which are transparent to the microcoder, occur when an NVAX resource is unavailable, 
such as when the ALU requires an operand that has not yet been provided by the Mbox. Hie 
microsequencer stalls when S3 of the Ebox is stalled. 

Micro traps allow the microcoder to deal with abnormal events that require immediate service. 
For example, a microtrap is requested on a branch mispredict, when the Ebox branch calculation 
is different from that predicted by the Ibox for a conditional branch instruction. When a microtrap 
occurs, the microcode control is transferred to a service microroutine. 

9.2 Functional Description 
9.2.1 Introduction 

The NVAX microsequencer consists of several functional units of logic that are explained in the 
following sections and illustrated in the block diagram, Figure 9-1. 
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9.2.2 Control Store 

The control store is an on-chip ROM which contains the microcode used to execute macroinstruc- 
tions and microtraps. It is made up of up to 1600 micro words. These are arranged as 200 entries, 
each entry consisting of 8 micro words. Each micro-word is 61 hits long, with bits <14:0> being 
used to control the microsequencer. The remainder of the microword, bits <60:15>, is used by the 
Ebox to control S3 through S5. The Ebox also receives bits <14,12:11>, enabling it to recognize 
the last cycle of a microflow and the validity of the microtest bus select lines. 

The control store access is performed during $34 of S2 and $1 of S3 of the NVAX pipeline. The 
output of the Current Address Latch (CAL), E_USQ_CAL%CAL_H<lOK)>, is used to address the 
control store. Bits <10:4,0> are used to select one of the 200 entries. The eight microwords in the 
selected entry then enter an eight-way multiplexer, where E - USQ_CAL%CAL - H<3:1> select the final 
control store output. This structure is used because E.USQ_CAL%CAL_H<3:1> are valid later than 
bits <10:4,0>, since E_USQ_CAL%CAL_H<3:1> must be OR'd with the microtest bus for a BRANCH 
format microinstruction (see Section 9.2.2.2.2 for details). 

9.2.2.1 Patchabie Control Store 

The patchabie control store is an on-chip SRAM which contains microcode patches. It consists of 
up to 20 microwords. It operates in parallel with the control store. The microaddress from the 
CAL is the input to its CAM (Content Addressable Memory). If the address hits in the CAM, the 
output of the patchabie control store is selected as the new microword, rather than the output of 
the ROM control store. 

The patchabie control store and CAM are precharged in #3 and evaluate in #41. The CAL output, 
E_USQ_CAL%CAL_H<104>>, is used in its entirety as the lookup address in the CAM, as opposed to 
the l-of-200 selection followed by the l-of-8 selection used in the ROM control store. 

9.2.2.1.1 Loading the Patchabie Control Store 

Entries in the Patchabie Control Store and its CAM are written under software control from 
the Patchabie Control Store Control Register (PCSCR) in the Ebox. The CAM must be disabled 
during this operation, so that no hits can occur. This is done by writing a zero to PCS CR<PCS_ 
ENB>. In addition, Parallel Test Port control of the MIB scan chain must be disabled, by writing 
a one to PCSCR<PAR_PORT_DIS>. Following assertion of ELB%BESET_L, PCSCR<PCS_ENB> 
and PCSCR<PAR_PORT_DIS> both contain zeroes. 

Data is serially scanned into the MIB scan chain, in the order shown in Table 9-2 (data is shifted 
from bit 0 to bit 91). The data is taken from PCSCR<DATA> ; shifting into the scan chain is 
enabled by PCSCR<RWL_SHIFT>. 

The final 20 bits scanned in (positions<19:0> in the scan chain) are used to select which entry in 
the patchabie control store is to be written. Only one of these 20 bits may be asserted at a time. 
When all 92 bits of the scan chain have been serially loaded, the selected patchabie control store 
and CAM entry are written under control of PCSCR<PCS_WRTTE>. 

All patchabie control store entries must be written with either valid or NULL patches before 
the PCS is enabled. A NULL patch is an entry whose CAM location is written with an un- 
used/unreferenced microaddress; there can never be a hit on this microaddress. The values of 
the MIB bits in a NULL patch are don't-care. 
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When the patchable control store is loaded, the patch revision must be loaded into PCSCR<PATCH_ 
REV>. If the patch is non-standard (i.e., one which is not a formally distributed patch, such as 
a performance analysis patch), PCSCR<N ONSTANDARD_PATCH> must be set to 1; otherwise 
it must be set to 0. These fields can be read by software to determine which patches are present 
in the machine. These fields are included in reads of the SID processor register. 

Enabling of the patchable control store is done by writing a zero to PCSCR<PAR_PORT_DIS> 
and then writing a one to PCSCR<PCSJENB>. 

See Section 8.5.22.1 for more details on PCSCR operation. 

The following table shows an example of writing an entry in the patchable control store. 
Table 9-1: Example: Writing an Entry In the Patchable Control Store 

Phase Action 

Microcycle 1 

1 
2 

3 Write 0 to PCSCR<PCS_ENB> 1 (disable the CAM) 

CAM NOW DISABLED 

Write a 1 to PCSCRcPAR_PORT_DIS> 1 (disable parallel port control) 

4 



Microcycle 2 

1 

2 PARALLEL PORT CONTROL NOW DISABLED 2 

3 Write data for MTB scan chain bit<91> to PCSCR<DATA> 1 
Write 1 to PCSCR<RWL_SHIFT> 1 

4 



Microcycle 3 

1 
2 

3 Write data for MEB scan chain bit<90> to PCSCR<DATA> 
Write 1 to PCSCR<RWL_SHIFT> 

4 Data for MIB scan chain bit<91> shifted into MTB scan chain bit<0> : 



Microcycle 4 

1 
2 

1 An S5 operation. 

2 Note 1-cycle delay between some PCSCR fields and MIB scan chain. 
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Table 9-1 (Cont): Example: Writing an Entry In the Patchable Control Store 

Phase Action 

Microcycle 4 

3 Write data for MTB scan chain bit<89> to PCSCR<DATA> 
Write 1 to PCSCR<RWL_SHIFT> 

4 Data for MIB scan chain bit<90> shifted into MIB scan chain bit<0> 
Data for MIB scan chain bit<91> shifted into MIB scan chain bit<l> 



Microcycle 94 

1 
2 

3 Write 1 to PCSCR<PCS_WRITE> 1 (write data into patchable control store) 

4 Data for MIB scan chain bit<91> shifted into MIB scan chain bit<91> 



Microcycle 95 

1 
2 
3 

4 DATA WRITTEN INTO PCS ENTRY FROM MIB SCAN CHAIN 2 

1 An S5 operation. 

2 Note 1-cycle delay between some PCSCR fields and MIB scan chain. 



Note that this example assumed no stalls within the Ebox. Also note that PCSCR<PCS_ 
ENB> and PCSCR<PAR_PORTJDIS> must be re-written with the correct values every cycle 
that PCSCR<DATA> is written. 



Table 9-2: Contents of MIB Scan Chain, When Loading Patchable Control Store 



Position 


Description 


Comment 


<91> 


MIB_H<0> 


Microword Field BRANCH.OFFSET 1 


<90> 


MEBJJ<1> 




<89> 


MEB_H<3> 




<88> 


MCBJH<3> 




<87> 


MIB_E<4> 




<86> 


MIB_B<5> 


« 


<85> 


BdttBJB<6> 





1 See Chapter 6 for details on microword fields. 
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Table 9-2 (Cont): Contents of MIB Scan Chain, When Loading Patchable Control Store 



Position 


Description 


Comment 








<83> 


MTR Tt^kA^ 


Mi ctowot<3 Field L 


<82> 




Microword Field MISC1 


<81> 


MTR H«4fi> 




sou? 


JIUl> M Xft<*7> 


„ 




MTR Ti^AA*. 




<78> 


MTR n^o> 


Microword Field FMT 


<77> 


MTR n«lfi> 


Microword Field MISC 


<76> 


MTR TTflfi> 




<75> 


MTR 


ff 




Dl 1 f*_Xl <1 O 


V 


<73> 


MTR TT«1S> 


„ 


<72> 


MTR_R^I 1 * 


Microword Field DST 


<71> 






<70> 


M j RJRJrf^f^ 


„ 


<69> 




„ 


^:68:> 


MTR W«?*7* 




<67> 


MJBJH46> 




<6€> 




Microword Field A 


<65> 


MIB_H<24> 




<64> 






<63> 


MXB_H^2> 


„ 


<62> 


BOBJS^l> 




<61> 


MTR_ft^aite». 


„ 


<60> 


CAM BOCBQADDBBSSO0> 


Microaddress to be patched 


<59> 


CAM MICBOADDSESS^> 




<58> 


CAM M3CROADDRESS<*> 


n 


W 1 -> 






<OD> 


CAM MJCROADDHKnn <fn 


« 


<55> 


CAM MJCKOADDSKSS<&> 




<54> 


CAMMJCRQADDHKSS<4> 


11 


<53> 


CAM MJCBOADDRESS<3> 




<52> 


CAMM3CBOADDRESS<£> 


« 


<51> 


CAM MJCBOADDItESS<l> 




<50> 


CAM MJCROADDRESS<0> 




<49> 


MEB W H<10> 


Microword Field SEQ.COND 
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Table 9-2 (Cont): Contents of MIB Scan Chain, When Loading Patchabie Control Store 



Position 


Description 


Comment 


<48> 


MIBJB<S> 




<47> 


MIB_H<8> 




<46> 


MD_H<14> 


Microword Field SEQ.FMT 


<45> 


MIB_H<18> 


Microword Field SEQ.CALL 


<44> 


MIB_H<12> 


Microword Field SEQ.COND 


<43> 


MtB_H<ll> 


*t 


<42> 


MIB_H<3»> 


Microword Field B 


<41> 


MEB_H<38> 




<40> 


MEB_H<37> 


it 


<39> 


MIB_H<3e> 




<38> 


MIB_H<35> 




<37> 


MTB_H<44> 


Microword Field MISC2 


<36> 


M£BJB«43> 




<35> 


MIB_E<42> 




<34> 


MTB_H<41> 




<33> 


MEB_H<46> 


Microword Field LIT 


<32> 


AHB_H<40> 


Microword Field D 


<31> 


MIB_H<54> 


Microword Field MRQ 


<30> 


MZB_H<SS> 




<29> 


MEBJH<S2> 


»t 


<28> 


MIB_H<61> 




<27> 


MIB_H<50> 




<26> 


mB_&<33> 


Microword Field W 


<25> 


MIB_H<32> 


Microword Field V 


<24> 


MCB_H<5»> 


Microword Field ALU 


<23> 


MIBJB<S6> 




<22> 


MTBja<87> 




<21> 


MIB_P<M> 




<20> 


MIBJH<56> 




<19> 


PCS KNTKT SELECT<19> 


Entry in PCS to be written 


<18> 


PCS ENTRY SKUBCT<18> 


n 


<17> 


PCS KNTRY 8ELBCT<17> 




<16> 


PCS KNTKT SELBCT<16> 


« 


<15> 


PCS ENTRY SKLBCT<1&> 




<14> 


PCS ENTRY SELECT<14> 


tf 


<13> 


PCS ENTRY SEX2ECT<13> 
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Table 9-2 (Cont): Contents of M1B Scan Chain, When Loading Patchabie Control Store 



A vol %*M%J1X 




limn m awi4* 


<12> 


PCS ENTRY SEL£CT<12> 




<11> 


PCS ENTRY SELECT<U> 




<10> 


PCS ENTBT SKLECT<10> 




<9> 


PCS ENTRY SELECT<B> 




<8> 


pes unsr selects 




<7> 


PCS ENTBT SEUECT<7> 




<6> 


PCS ENTBT SELSCT«8> 




<5> 


PCS ENTBT SELBCT<S> 




<4> 


PCS ENTBT SELECT«4> 




<3> 


PCS ENTRY SELECT<3> 




<2> 


PCS ENTBT SELECT<2> 




<1> 


PCS ENTBT SELECT <1> 




<0> 


PCS ENTBT SELECT <0> 





9.2.2.2 Microsequencer Control Field of Microcode 

The microsequencer control field of the NVAX microword is used to help select the next micro-word 
address. The next address source is explicitly coded in the current microword; there is no concept 
of sequential next address. 

The SEQ.FMT field, bit <14> of the microsequencer control field, selects between the following 
two formats: 

Figure 9-2: Microcode Microsequencer Control Field Formats 



14 13 12 111 10 OS 08107 06 05 04 | 03 02 01 00 
+--+--+--.+--+--+--+—.+—+—+—+--+—+—.+—+-.-.+ 

JUMP I 01 | | J | 

+ — .+—+_-+_-+— 

I I I 

I | H EEQ.MUX 

| H SEQ.CALL 

+- — SEQ.FMT 

14 13 12111 10 09 08107 06 05 04|03 02 01 00 

+ + + I— -H H +—+—+--+ +--+—+--+ 

BRANCH | 1| I SEQ.COND I BRANCH. OFFSET | 

+ — H + + + + + I— -+--+ — +- -+ — ^ +-— 1 + 

I I 

I + SEQ.CALL 

H SEQ.FMT 
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Table 9-3: Jump Format Control Field Definitions 



Name 


Extent 


Description 


SEQ.FMT 


14 


0 for JUMP 


SEQ.CALL 


13 


Controls whether return address is pushed on znicrostack 


SEQ.MUX 


12:11 


Selects source of next microaddress 


J 


10:0 


JUMP target address 


Table 9-4: Branch Format Control Field Definitions 


Name 


Extent 


Description 


SEQ.FMT 


14 


1 for BRANCH 


SEQ.CALL 


13 


Controls whether return address is pushed on znicrostack 


SEQ.COND 


12:8 


Selects source of Microtest Bus 


BRANCH.OFFSET 


7:0 


Page offset of next microinstruction 



9.2.2.2.1 Jump Format 

Jump format microinstnictions choose the next address from one of three possible sources: the J 
field (bits<10:0> of the current microword), the microstack, or the last cycle logic. The microword 
fields decode as follows: 



Table 9-5: Jump Format Control Field Decodes 







NEXT 
ADDRESS 




SEQ.CALL 


SEQ.MUX 


SOURCE 


REMARKS 


0 


0 


J 


JUMP microinstruction. 


1 


0 


J 


CALL xmcroinstruction. Current microword address 
with bits <3:0> incremented by one is pushed onto 
microstack. 


X 


1 


STACK 


RETURN microinstruction. Top entry of microstack 
is selected. 


X 


2 


Last Cycle Logic 


Last cycle. Select next microfiow. 


X 


3 


Last Cycle Logic 


Last cycle and enable integer overflow trap. Select 
next microflow. 



On a CALL microinstruction, the address of the current microinstruction, with bits <3:0> 
incremented by one, is pushed onto the Microstack. The CALL address is modified to avoid 
a RETURN to the CALL address, which would cause an infinite loop. 
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9.2.2.2.2 Branch Format 

Branch format microinstructions allow the microcoder to perform CASE operations on NVAX 
state. The SEQ.COND field drives the microtest bus select lines which select the source 
that drives the microtest bus. (Refer to Section 9.2.3.1.1 for details.) The microtest bus 
is OR'd with bits <3:1> of the BRANCH.OFFSET field, allowing up to an eight-way case. 
Casing may be reduced to two-way or four-way by setting to ones the appropriate bits in 
BRANCH.OFFSET<3:l>. 



Table 9-6: Branch Format Control Field Decodes 





NEXT ADDRESS 




SEQ.CALL 


SOURCE 


REMARKS 


0 


BRANCH.OFFSET 


BRANCH microinstruction. 


1 


BRANCH.OFFSET 


CONDITIONAL CALL microinstruction. Current 
microword address with bits<3:0> incremented by one is 
pushed onto microstack. 



As in the JUMP format, the SEQ.CALL field is used to indicate that a RETURN address must 
be pushed on the microstack. 

For the purposes of BRANCH microinstructions, the control store is divided into 256-microword 
pages. The target of a branch microinstruction must be in the same page as the BRANCH as 
only the least significant 8 bits of the address are modified. The BRANCH. OFFSET field is the 
destination address offset within the current page. 

A branch address is made up as follows: 
Table 9-7: Branch Address Formation 



Bit(s) Source 

<10:8> Current Address<10:8> 

<7:4> BRANCH.OFFSET<7:4> 

<3:1> BRANCH.OFFSET<3:l> OR UTEST<2K)> 

<0> BRANCH.OFFSET<0> 



9.2.2.3 MIB Latches 

The microword output from the Control Store 8-to-l multiplexer is latched in $i into the Control 
Store Microinstruction Buffer (CS_MIB) latch. The microword output from the Patchable Control 
Store is also latched in into the PCS.MIB latch. The outputs of the CS_MIB and PCS_MIB 
latches drive a multiplexer, which selects the PCS_MIB output if the CAL output hit in the 
Patchable Control Store CAM; otherwise, the multiplexer selects the CS_MIB output. 

Bits <14:0> of the multiplexer output (the Microsequencer 
Microinstruction, E_USQ_CSM%UMTB_H<14K>>) are driven back to the microsequencer; all bits are 
driven to the Microinstruction Buffer (MIB) latch which operates in $£• Bits <60:14,12:11> of 
the MIB latch output (E.USQ3SMIBJB) are driven to S3 of the Ebox; all bits are driven to the MIB 
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scan chain (see Section 9.5.2). When a microtrap is detected, the contents of the MIB latch are 
forced to NOP. The MIB latch is stalled on a microsequencer stall. 

9.2.3 Next Address Logic 

The remainder of the microsequencer is devoted to determining the next control store lookup 
address. There are five next address sources: 

1. JUMP/BRANCH. OFFSET field of Microword 

2. Microtrap Logic 

3. Last Cycle Logic 

4. Microstack 

5. Test Address Generator 



9.2.3.1 CAL and CAL INPUT BUS 

The CAL, or Current Address Latch, is a static latch which holds the 11 hit address used to access 
the control store. It operates in £3, and is stalled on a microsequencer stall. Bits <10:8> are also 
"stalled" when foraiing a branch address (see Table 9-7). 

The input to the CAL is the CAL Input Bus (E_USQ_BUS%CAL_INPUT_L). The CAL Input Bus 
is a dynamic bus, precharged in #2- selected next address source drives this bus in $3. 
Bits <14,12:11> of the microsequencer control field are used in selecting three of the next 
address sources: E_USQ_CSM%UMIB_H<lOtO> (for a BRANCH or JUMP address), the output of 
the last cycle logic, and the microstack output. The fourth CAL Input Bus source is the 
microtrap address; if a microtrap is detected, this input is selected regardless of the value of 
E_USQ_CSM%-UMIB_H<14,12:11>. The fifth source is a test address, driven from the Test Address 
Generator. This input has the highest priority. In summary: 



Table 9-8: Current Address Selection 



TEST 
ADDR 



TRAP 

DETECTED 



SEQJFMT 
<14> 



SEQJMUX 
<12:11> 



NEXT 
ADDRESS 

SOURCE 



REMARKS 



0 
1 
X 



0 
X 
X 



XX 

00 

01 

IX 
XX 
XX 



Branch Address 1 



Microstack 

Last Cycle Logic 

Microtrap Logic 

Test Address 
Generator 



BRANCH/CONDITIONAL 
CALL microinstructions 

JUMP/CALL 
microinstructions 

RETURN 
microinstruction 

Start new microfiow 

Microtrap 

lest address 



1 See Table 9-7 
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9.2.3.1.1 Microtest Bus 

The microtest bus allows conditional branches and conditional calls based on information 
generated outside the microsequencer, such as Ebox condition codes. The SEQ.COND field of 
the BRANCH format is driven on the microtest select lines, E_USQ%UTSEL_H<4K)>, in #23 • These 
lines are decoded by all conditional information sources in the Ebox. The selected source drives 
its information on the microtest bus, E_BUS%UTEST_L<2K>> . E_BUS%UTEST_L must be valid in time 
to be OB'd with value on the CAL Input Bus and latched in the CAL in #3. 

The sources for the microtest bus are as follows: 



Table 9-9: Microtest Bus Sources 



UTSEL<4K)> Select 


UTEST<2:0> 


00 


No source 


000 


01 


ALU.NZV 2 


ALU_CC.NALU_CC.ZALU_CC.V 


02 


ALU.NZC 2 


ALUJ3C.NALUJX).ZALU_CC.C 






EB_BUS<2:0> 


04 


B.5-3 1 


AD.i) U iXO .0 > 


05 


A/-5 1 


EA_BUS<7:5> 


06 


A.15-12 1 


EA_BUS<15:14>, EA_BUS<13> OR EAJBUS<12> 


07 


A31.BQA.BNZ1 1 


EA_BUS<31>, EB_BUS<2:0> = 0, EB_BUS<15:8> NEQ 0 


08 


MPU.0-6 2 


MPU0_6<2:0> 


09 


MPU.7-13 2 


MPU7_13<2:0> 


OA 


STATE.2-0 2 


STATE<2.-0> 


0B 


STATE.5-3 2 


STATE<5:3> 


OC 


OPCODE.2-0 1 


OPCODE<2:0> 


0D 


PSL.26-24* 


PSL<26:24> 


0E 


PSL.29.23-22 8 


PSL<29>, PSL<23:22> 


OF 


SHRNZ 2 JNT 


SHF_CC.N, SHF_CCZ, INTERRUPT_REQUEST 


10 


VECTOR,TEST 


ECR<VECTOR JJNIT_PRESENT> 3 , TEST DATA, TEST STROBE 


11 


FBOX 


Encoded fault<l:0> 4 , ECR<FBOX.ENABLED> = 0 s 


12 


FQ.VR 1 


0, FIELD_QUEUE_NOT_VALID, FIELD_QUEUE_RMODE 


13-1F 


Not Used 




1 Data is 
2 Datais 


taken from S3, 
taken from S4. 





s Data is taken from S6. 
4 See Section 8.5.19.7. 



The microtest select lines are always driven with bits <12:8> of the MIB output regardless of the 
microinstruction format. The microtest bus is only OR'd with the CAL Input Bus if the BRANCH 
source is selected to drive that bus. 
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Two of the microtest sources, the Field Queue (FQ) and the Mask Processing Unit (MPU), perform 
some function based on the value of the microtest select lines. These functions must check 
SEQ.FMT, E_USQ9cMEB_H<l4>, for validity of the microtest select lines. 

The microtest select lines are precharged to a value of zero during #1; no microtest source is 
selected for this value. 

9.2.3.2 Microtrap Logic 

Microtraps allow the microcoder to deal with abnormal events that require immediate service. 
When a microtrap occurs, the microcode control is transferred to a service microroutine. 
Operations further behind in the pipe than the one which caused the microtrap are aborted. 

Microtraps are generated by the Ebox, Mbox, or Ibox. Those Ebox microtrap requests considered 
faults are asserted in S4 of the microinstruction in which they occurred. Those that are considered 
traps are asserted in S5 of the microinstruction in which they occurred. 

Microtraps have higher priority than all other next address sources except the Test Address 
Generator. Microtraps are detected in $4. The microtrap signals are OR'd together in &i to form 
E_USQ%PE_ABORT_L. The trap signals are prioritized and address lookup is done to select the 
appropriate microtrap handler address, which is driven on the CAL Input Bus in $3. 

Since microtraps are not detected until #4, too late for control store access in that cycle, the 
signal E_USQ%PE_AB ORT_L is used to force NOPs in all the Ebox and microsequencer inter-stage 
latches in $1 and #2- This effectively flushes the pipe. In the cycle following microtrap detection, 
control store access is done using the microtrap handler address, and the first microword of the 
trap handler is driven to S3 on E_USQ%MBB_H. 

Microtrap microcode flows flush the Ebox, Fbox, the specifier queue in the Mbox, the Instruction 
Queue in the microsequencer, and the Ibox. The only exception to this is the branch mispredict 
microtrap, which does not flush the Ibox. The microtrap handler also loads a new PC which 
allows the Ibox to start prefetching. At the end of the microtrap, microcode control is returned 
to the last cycle logic. 

Microtrap signals must be asserted for only one cycle, to prevent multiple detections of the same 
trap. 

9.2.3.2.1 Microtraps 

1. Powerup 

The powerup microtrap is requested when the chip is powered up. This forces the internal 
state of the chip to a known condition. See Chapter 16 for details. 

2. Asynchronous Hardware Error 

The asynchronous hardware error microtrap request can happen at any time regardless of 
what is in the pipeline. The following conditions cause execution of this microtrap: 

• S3 Stall Timer Expiration 

The S3 stall timer counts the number of consecutive cycles that S3 is stalled. When 
the counter reaches its limit, it initiates the Asynchronous Hardware Error microtrap by 
asserting E_TIM%S3_TIMEOUT_H. See Section 8.5.25.1 for more detail concerning the timer. 

• Translation Buffer Parity Error 
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If the Mbox detects a TB parity error, it initiates the Asynchronous Hardware Error 
microtrap by asserting M%TB_PERR_'EEAP_L. 

3. integer Overflow 

The integer overflow microtrap request, E_FLT%IOVFL_L, is asserted in S5 when the Ebox 
detects an integer overflow condition (see Section 8.5.19.3) during the last cycle of a 
macroinstruction with overflow checking enabled. The microinstruction that checked the 
overflow condition completes, but any microinstruction initiated after it is aborted. 

4. Branch Mispredict 

A branch mispredict microtrap request, E_PSL%BRANCH_MISPREDICT_E, is asserted in S5 by 
the Ebox when the output of the Branch Queue (the Ibox branch prediction) does not match 
the branch direction calculated by the Ebox. See Section 8.5.19.3. 

5. Reserved Instruction Fault 

The Ebox initiates the reserved instruction microtrap in S4 when the Fbox is disabled and 
any Fbox instruction other than MULL is issued. It asserts E_FLT%RSVD_INSTILL to initiate 
the microtrap. 

6. Hardware Errors 

The Ebox hardware error microtrap request, E_FLT%HW_ERR_H, is asserted in S4 on 
operand-related hardware errors, such as the attempted access of an MD register which 
has its error bit set. 

7. Memory Management Exceptions 

• Reported by Mbox 

An explicit read or write request by the Ebox can result in a memory management 
exception. This causes the Mbox to assert the microtrap request signal, M9£MME_TRAP_L. 
See Section 12.5.1.5.3.7 for further detail. 

• Reported by Ebox 

A memory management fault can also occur during a memory access initiated by 
the Ibox, such as for an opcode or operand specifier. When this happens the Ibox 
asserts I%IMEM_MEXC_H. The Ebox combines this signal with several other conditions to 
generate E_FLT%MME_EER_H. It initiates the memory management microtrap by asserting 
E_FLT%MME_ERR_H in S4. See Section 8.5.15.14 and Section 8.5.19 for more detail. 

8. Reserved Addressing Mode 

A reserved addressing mode fault occurs when the Ibox detects a reserved addressing 
mode on an operand specifier. The reserved addressing mode microtrap request, 
E_FLT%RSVD_ADDR_jMODE_H, is asserted in S4 by the Ebox. Refer to Section 8.5.15.14 and 
Section 8.5.19 for details. 

9. Floating Point Fault 

A floating point fault is a fault detected by the Fbox. If the current entry of the retire queue 
points to the Fbox, the request E_FX,T%FLOATING_FAULT_H can be asserted. If the retire queue 
points to the Ebox, the request is stalled until the retire queue does point to the Fbox. There 
are four possible causes for assertion of E_FLT%FLOATlNG_FAULT_H: floating overflow, floating 
underflow, reserved operand, and floating divide by zero. The trap handler cases on the 
floating point fault code on the microtest bus. See Section 8.5.16.5 for further detail. 
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9.2.3.2.2 Microtrap Request Timing 

The exceptions which result in microtrap requests to the microsequencer are detected in different 
pipeline seqments. In addition, some microtrap requests are delayed in order to align the request 
with a particular pipeline segment. 

The following table gives the pipeline segment in which the exception is detected and the pipeline 
segment in which the microtrap request is made for each type of microtrap. 



Table 9-10: Microtrap Request Timing 



Microtrap 


Exception 


Microtrap 




Detected 


Requested 


Powerup 


N/A 


N/A 


Asynchronous Hardware Error, S3 Stall Timer 


S3 


S3 


Asynchronous Hardware Error, TB Parity Error 


N/A 


N/A 


Integer Overflow 


So 


So 


Branch Mispredict 


So 


So 


Reserved Instruction Fault 


S3 


S4 


Hardware Error 


S3,S4 


S4 


Memory Management Exception, Mbox 


N/A 


N/A 


Memory Management Exception, Ebox 


S3.S4 


S4 


Reserved Addressing Mode 


S3,S4 


S4 


Floating Point Faults 


S4 


S4 



9.2.3.2.3 Prioritization of Microtraps 

Microtraps must he prioritized since more than one request may he asserted at a time. Microtrap 
priorities and microtrap handler addresses are given in the following table. 



Table 9-11: 


Microtraps 




Priority 


Microtrap 


Dispatch Address (Hex) 


1 


Powerup 


00 


2 


Asynchronous hardware errors 


04 


3 


Integer overflow 


08 


4 


Branch mispredict 


OC 


5 


Reserved instruction fault 


10 


6 


Hardware error 


14 


7 


Memory management exceptions 


18 


8 


Reserved addressing mode faults 


1C 


9 


Floating point faults 


20 



The priorities of the microtraps are assigned utilizing the following dependencies: 
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1. The chip must be placed in a known state upon powerup. 

2. Once in a known state, asynchronous hardware errors take precedence over all, since they 
indicate a serious problem. 

3. Microtrap requests issued in S5 have priority over those in S4 since they are further down 
the pipe. 

4. Opcode faults take priority over operand faults. 

5 . Of the requests issued in S4, whichever physically took place first (was forwarded the farthest) 
has priority. 

6. Architecturally defined faults or traps (i.e. integer overflow) have priority over 
implementation denned faults or traps (i.e. branch mispredict). 

7. Reserved addressing mode faults are mutually exclusive of operand memory management 
faults for the same operand, because the source queue is empty before a reserved addressing 
mode fault request is made. 

8. The floating point fault may only be requested when the retire queue points to the Fbox. 

9.2.3.2.4 Erroneous Microtrap Interruption 

A window of at least 4 cycles exists between initiation of a microtrap (assertion of 
E_USQ9ePE_ABORT_L) and decoding of RESETCPU for all microtraps except Branch Mispredict. 
(A subset of the RESETCPU operations is performed immediately on detection of branch 
mispredict.) During this window, a lower priority microtrap based on state which will be cleared 
by RESETCPU must not be allowed to interrupt the higher priority microtrap which has begun 
execution. This restriction is met by the following rules: 

• Powerup 

Powerup can interrupt any microtrap as it has the highest priority. The powerup microtrap 
is initiated by deassertion of K_E%RESET_L. Assertion of K_E%RESET_L causes all NVAX state 
to be initialized, so no microtraps will occur to interrupt powerup based on previous state. 

• Asynchronous Hardware Error 

Asynchronous hardware errors can interrupt any microtrap but Powerup. Due to the effects 
of K_E%RESET_L described above, no special logic is needed to meet this constraint. 

• Ebox-Generated Microtraps 

All Ebox-generated microtrap requests (integer overflow, branch mispredict, reserved 
instruction fault, Ebox hardware error, Ebox memory management exception, reserved 
addressing mode, and floating point faults) are cleared within the Ebox immediately on 
assertion of E_USQ%PE_ J ABOET_L. Thus, none of these microtraps can interrupt another. 

• Mbox-Generated Microtrap 

The Mbox memory management exception can occur at any time between assertion of 
E_USQ%PE_ABORT_L and decoding of RESETCPU, if an Ebox-initiated memory reference 
is outstanding. The following list describes the possibility of assertion of M%MME_TRAP_L 
(initiation of the Mbox memory management exception microtrap) during each type of 
microtrap. 

• Powerup: 

As described above, M%MME_TRAP_L cannot be asserted. 

• Asynchronous Hardware Error: 
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By the nature of these errors, the Ebox may be performing any operation during initiation 
of this microtrap, so M%MME_TKAP_L could be asserted. 

• Integer Overflow, Branch Mispredict: 

Detection of these traps occurs only on the last cycle of a microflow, in S5. All outstanding 
Ebox-initiated memory references which could produce an error have been completed by 
this time, so M%MME_TRAP_L cannot be asserted. 

• Reserved Instruction Fault: 

Initiation of this microtrap occurs in S4, on the first cycle of the microflow for the offending 
instruction. If that same microword begins an Ebox-initiated memory reference, the 
reference will be aborted on initiation of the microtrap. 

On initiation of a Reserved Instruction Fault microtrap, S5 can only contain the last 
microword of the previous microflow. As described above, M < £MME_TRAP_L could not be 
asserted at that point. 

• Ebox Hardware Error, Ebox Memory Management Exception, Reserved Address Mode: 
These faults are generated during operand access. By microcode convention, no operands 
are referenced while there is an outstanding Ebox-initiated memory reference. Thus, 
M<£MME_TRAP_L cannot be asserted. 

• Mbox Memory Management Exception: 

Multiple Ebox-initiated memory references can be outstanding at any time, so a second 
Mbox Memory Management Exception could occur. 

• Floating Point Fault: 

Similar to the Reserved Instruction Fault, this fault is detected in S4, with the first result 
transfer from the Fbox. Any memory reference initiated during this cycle will be aborted 
on initiation of the microtrap. S5 could only contain the last cycle of a microflow. Thus 
M%MME_TRAP_L cannot be asserted. 

In summary, the Mbox Memory Management Exception microtrap is the only trap which could 
incorrectly interrupt a higher priority microtrap in this window. In order to prevent this 
error, detection of the Mbox Memory Management Exception is blocked at the microtrap logic 
for the cycles from microtrap initiation (assertion of E_USQ%PE_ABORT_L) through execution of 
RESET. CPU (assertion of EJtfSC%EARL5LFLUSH_EBOX k .H). Mbox Memory Management Exception 
detection is enabled again in the cycle following execution of RESET.CPU. 

Branch Mispredict is the only microtrap for which RESET.CPU is not executed. In this case, 
E_MSC%EARLY_FLUSH_EBOX_H is asserted in the same cycle as E_USQ%PE_ABORT_L; therefore, 
detection of the Mbox Memory Management Exception is only blocked at the microsequencer for 
one cycle. However, as described above, M%MME_TEAP_L cannot be asserted during the Branch 
Mispredict microtrap, so the blocking is not necessary for proper execution of this microtrap. 

9.2.3.2.5 Microtrap Detection Abort Effects 

The microsequencer aborts operation on detection 

of a microtrap (assertion of E_USQ%PE_ABORT_L) . The following table shows the timing for all 
microsequencer logic that is cleared or reset on an abort. 



DIGITAL CONFIDENTIAL 



The Microsequencer 9-17 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Table 9-12: 


Abort Effects in the Microsequencer 


Phase 


What is Cleared/Reset 




■p TTBQ CTT UtT -ATI? TWO RT^T T T 


$2 


E_USQ%MIB J5 tO S3 




E_t«Q%MACKO_lST_CyCUE_H to S3 




K%FBOX^lST_CYCLK_L to FboX 


$3 


E_U8<^STL%VERy_LAOT_IJSQ_S'IALI^L 




E%FBOX_lST_CYCLE_L to FboX 




e_usq%macro_ist_cycle_h master latch 







9.2.3.3 Last Cycle Logic 

The last cycle logic examines several conditions used to determine which new microfiow is to be 
taken when LAST. CYCLE or LAST. CYCLE. OVERFLOW is detected on E_USQ_CSM%UMIB_H, no 
microtraps are detected, and no test address is driven. There are five possible new microflows, 
listed in order of priority: 

1. Interrupt Request Handler 

2. Trace Fault Handler 

3. First Part Done Handler 

4. Instruction Queue Stall 

5. The macroinstruction microcode indicated by the top entry in the instruction queue. 

The last cycle logic prioritizes these sources and performs address lookup. In addition, the signal 
E_USQ_LST%SELECT_IQ_H is derived. This signal is asserted when a valid entry is taken from the 
instruction queue. 



Table 9-13: 


Microaddresses for Last Cycle Interrupts or Exceptions 


Priority 


Interrupt or Exception 


Dispatch Address (Hex) 


1 


Interrupt request 


24 


2 


Trace fault 


28 


3 


First part done 


2C 


4 


Instruction Queue Stall 


30 


The priorities 


in the last cycle logic are assigi 


ied using the following dependencies: 



1. Interrupts and trace faults must be handled between instructions. (Interrupts may also be 
serviced at defined points during long instructions such as string instructions; this servicing 
is handled by microcode.) 

2. By definition, an interrupt that is permitted to request service has a higher priority level 
(IPL) than any exception that occurs in the process to be interrupted, or any instruction to 
be executed by that process. 
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3. When tracing is enabled (E_PSL%PSL_H<TP> is set), a trace fault must be taken before the 
execution of each instruction. 

4. If an instruction begins execution with PSL<FPD> set, the first part done handler must be 
entered rather than the normal entry point for the instruction. 

5. PSL<TP> and PSL<FPD> cannot both be set when an instruction begins execution. In order 
for PSL<FPD> to be set, the instruction must have been interrupted previously; the interrupt 
handler always clears PSL<TP> before saving the PSL when interrupting an instruction. 
(Note that the interrupt handler does not clear PSL<TP> when the interrupt is taken between 
instructions.) 

6. The Instruction Queue Stall microword is executed if an opcode is requested from the 
Instruction Queue but the queue is empty. 

9.2.3.3.1 Interrupts 

Interrupt servicing is requested by the Ebox by assertion of E%INT_REQ_H. For more information 
on interrupts, see Chapter 10. 

9.2.3.3.2 Trace Fault 

A trace fault should be requested when the PSL<TP> bit is set. Due to the pipelined 
implementation of the Ebox, a local version of the PSL<TP> bit must be maintained; thus, the 
trace fault is actually requested when LOCALJTP is asserted. 

There are two cases that must be considered in setting LOCALJTP. In the first case, a 
macroinstruction starts execution with PSL<T> set. This is the normal program tracing mode. 
LOCALJTP must be set immediately after the macroinstruction begins execution. In the second 
case, an interrupt was taken at the end of a macrointruction, and the trace must be taken 
when interrupt processing completes. In this case, PSL<TP> is set, and LOCALJTP is asserted. 
LOCALJTP is also updated whenever the PSL is written. LOCALJTP is cleared by loading the 
PSL as a longword, with a value of 0 in the <TP> bit. 

9.2.3.3.3 First Part Done 

The first part done handler is selected when PSL<FPD> is asserted and the instruction queue 
output is valid. The top entry in the instruction queue is removed (E_USQ_LST%SELECT_IQ_H 
is asserted), but the last cycle address is the first part done handler address, rather than the 
dispatch taken from the instruction queue. 

If PSL<FPD> is asserted and the instruction queue is empty, the Instruction Queue Stall 
microword is selected. 

9.2.3.3.3.1 Interaction with Reserved instructions 

The Ibox detects unimplemented instructions (such as POLYx), and causes the microcode to 
enter the reserved instruction fault handler by placing the microaddress for that handler in the 
dispatch field of the instruction queue entry for the unimplemented instruction. However, if 
PSL<FPD> is asserted, the last cycle logic selects the first part done handler rather than the 
reserved instruction fault handler. The first part done handler detects this case and branches to 
the reserved instruction fault handler. 
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9.2.3.3.4 Instruction Queue 

The instruction queue is a FIFO filled by the Ibox. This queue permits the Ibox to fetch and 
decode instructions ahead of Ebox execution. 

The instruction queue is 6 entries deep. Each entry is 22 bits long. The format of each entry is 
as follows: 



Figure 9-3: Instruction Queue Entry Format 



21 20119 18 17 


16 1 15 14 13 12111 10 09 08|07 06 05 04 1 03 02 01 00 


I OPCODE 


1 


DL 1 FI | DISPATCH | V| 








Table 9-14: 


Instruction Queue Entry Format Field Definitions 


Name 


Extent 


Description 


OPCODE 


21:13 


9-bit opcode of the instruction. 


DL 


12:11 


Initial data length of instruction operands. 


FI 


10 


Set if entry is an Fbox instruction. 


DISPATCH 


9:1 


Microcode address of the instruction's microflow. 


V 


0 


Set if entry is valid. 



The instruction queue entry indicated by the write pointer is written in #4. The write pointer is 
advanced in #2 & the valid bit is set in the new queue entry. 

The instruction queue entry indicated by the read pointer is read in $1. The address used to 
access the control store is derived from the instruction queue entry as follows: 



Table 9-15: Control Store Address Formation 



Bit(s) Value 

<10> 0 

<9:1> IQ entry DISPATCH field 

<0> 0 



If the valid bit of the entry being read is set, and the instruction queue is selected as the CAL 
Input Bus source, E_USQ9atfACRO_lST_CYCLE_H is asserted and driven to the Ebox in #1 of S3. 
This signal is cleared on a microtrap, and stalls on a microsequencer stall. If the first cycle of an 
Fbox instruction is detected(<FI> is asserted), the signal E%FBOX w lST_CYCLE_L is also asserted, 
and driven to the Fbox in #23 °f S2. This signal is only asserted once per instruction; it is not 
stalled on a microsequencer stall. 

If the valid bit of the entry to be read is not set, and the instruction queue is selected as the CAL 
Input Bus source, the last cycle logic selects the Instruction Queue Stall microaddress (030#16), 
which is used to look up the stall microword in the control store. The stall microword is a NOP for 
the Ebox; it selects the last cycle logic again in the microsequencer. In addition to driving the stall 
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microword to the Ebox, E_USQ%IQ_STALL_H is asserted in #1 of S2. This signal, in conjunction 
with memory management and hardware error signals driven by the Ibox, is used by the Ebox 
to detect instruction stream referencing errors. 

The read pointer is advanced in #3 if E_US<£_CSM%UMIB_H selects the last cycle logic, the last 
cycle logic selects the instruction queue, and the valid bit in the queue entry that was read out is 
set. When the read pointer is advanced, the valid bit in the entry read out is cleared. The read 
pointer is stalled on a microsequencer stall. 

The instruction queue is flushed when the Ebox decodes RESET.CPU from the MIB 
(E_MSC%EARLY_FLUSH_EBOX_H is asserted). The pointers are reset, and the entry valid bits are 
cleared. 

Table 9-16 shows the phase-by-phase events that occur on an instruction queue stall. Initially, 
the read and write pointers both have a value of 4; the queue is empty. 



Table 9-16: Instruction Queue Operation 


Phase 


Action 


Microcycle 1 


1 


E_USQ_CSM%UM3B_H = LAST. CYCLE 




k_usq%iq_staix_h asserted 


2 


Last microword of instruction flow driven to S3 


3 


CAL = IQ stall address 


4 


Write i%iq_bus_h to Entry[4] (value = valid data) 


Microcycle 2 


1 


K_USQ_CSM%UMIB_H = LAST. CYCLE 




K_usQ_rNQ%iQ_ouT_H = Entry[4] 




e_usq_lst%sklbct_iq_h assserted 




k_usq%iq_staii^h deasserted 


2 


NOP microword driven to S3 




Increment write pointer (pointer=5) 


3 


CAL = Entry[4] 




Increment read pointer (pointer=5) 




Clear valid bit in Entry[4] 


4 


Write i%iq_bds_h to Entry[5] (value = valid data) 


Microcycle 3 


1 


k_usqlCsm%bmib_h = microsequencer field of first microword 




e_usq*macro_ist_cycle_h asserted 


2 


First microword of new instruction flow driven to S3 


3 
4 


Increment write pointer (pointer=6) 
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9.2.3.3.4.1 Instruction Context Latches 

The instruction queue drives the dispatch address to the last cycle logic. The remainder of the 
queue entry (DL,OPCODE,FI) is latched in the instruction context (ICTX) latches. The format is 
as follows: 



Figure 9-4: Instruction Context Format 

11 10 09 08 107 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + 
I OPCODE | DL |FI| 

+ — + — + — + — + — + — + — + — + — + — + — + — + 



Table 9-17: Instruction Context Format Field Definitions 

Name Extent Description 

OPCODE 11:3 9-bit opcode of the instruction. 

DL 2:1 Initial data length of instruction operands. 

FI 0 Set if entry is an Fbox instruction. 



The output of the queue is latched every #2 fo* hold-time reasons. The ICTX master latch operates 
in #4 of S2, and is loaded from the queue output latch only when a valid entry is removed from 
the instruction queue (E_USQ_LST%SELECT_IQ_H is asserted). The ICTX slave latch operates in 
#1 of S3; its output (E_USQ%ICEX_H) is driven to the Ebox. The instruction context latches are 
only valid when their respective pipeline stages are executing macroinstructions. 

Both the master and slave latches are stalled on a microsequencer stall. The slave latch is 
stalled holding the correct value for the current S3 cycle, and the master latch is stalled holding 
the correct value for the next cycle. 

The opcode portion of the instruction context (E%FOPCODE_H) is driven to the Fbox from the 
instruction queue output latch, in #2 °f S2. 

9.2.3.4 Microstack 

Frequently used microcode can be made into microsubroutines. When a microsubroutine is called, 
the return address is pushed onto the microstack. The output of the microstack is driven on the 
CAL Input Bus when a RETURN is decoded from the E_USQ_CSM%UMEB_H, no microtraps are 
detected, and no test address is driven. 

The microstack is 6 entries deep. It is a circular stack, with the write pointer always one entry 
ahead of the read pointer. Each entry is an 11-bit control store address. The addresses stored in 
the microstack incorporate any modification done by the microtest bus. 

Every #1, the entry indicated by the microstack read pointer is read out into a #1 latch, where 
it is held to be driven on the CAL Input Bus in #3. Also in #1, the RETURN address is written 
into the entry ahead of the microstack read pointer. The RETURN address is formed by adding 
1 to bits <3:0> of the CALL address in the CAL. Bits <10:4> are unchanged. 
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The microstack pointer is incremented in <P 4 on a CALL or CONDITIONAL CALL 
microinstruction; it is decremented on a RETURN microinstruction. The microstack pointer 
is stalled on a microsequencer stall. It is only reset when the chip reset signal, KJE%RESET_L, is 
asserted. 



Figure 9-5: Microstack Organization 



POINTER 



-+ 



I o | ! | 

+ + + + 

111 | First Call writes here | 



I 2 | + >| Pointer - 2 read entry | 

+ + | + + 

I 3 | + >| Pointer - 2 write entry | 

+ 1- + + 

I 4 | | I 



+ +- 

5 I I 



Consider a CALL followed immediately by a RETURN with an initial microstack pointer value of 
2. Table 9-18 shows the phase-by-phase operation of the microstack during the next three cycles. 

X: CALL Y 

X+l: {next microword} 



Table 9-18: Microstack Pointer Example 



Phase Action 



Microcycle 1 



1 

2 

3 CAL = X 

4 



Microcycle 2 



Write X+l 1 to Array[3] 
USTACELOUT<10:0>=ArrayC2] 

K_USQ_CSM%UBOB_H = CALL 



2 

3 CAL = Y 



4 Increment microstack pointer (pointer=3) 



1 Assumption: the result of the increment to bits<3:0> of X is X+l. 
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Table 9-18 (Cont.): Microstack Pointer Example 

Phase Action 

Microcycle 3 

1 Write Y+l to Array[4] 

USTACKj3UT<10:0>=Array[3] (value = X+l) 

E_U8Q_CSM%UMTB_H = RETURN 

2 

3 CAL = X+l 

4 Decrement microstack pointer (pointer=2) 



9.2.4 Stall Logic 

The microsequencer is stalled whenever S3 is stalled. The Ebox derives the signal 
E_STL%USEQ_STALL_H which is used to stall the microsequencer. The microsequencer creates 
delayed versions of this signal as needed to stall various latches. The signals E_USQ%PE_ J ABORT_L 
(asserted on initiation of a microtrap) and E_USQ_TST%FORCE_TEST_ADDR_L (asserted on detection 
of the Test Address Generator driving a control store microaddress, see Section 9.5) break a 
microsequencer stall by clearing the delayed versions of E_STL%USEQ_STALL_H . 

The following table shows the timing for all stallable logic in the microsequencer. 



Table 9-19: 


Stall Timing In the Microsequencer 


Phase 


What Stalls 




ICTX slave latch to S3 




K_U8Q%MACKO_lST_CYCLE_H latch to S3 




s_uso%anBj9 to S3 


$3 


Current Address Latch 




k_usq%macro_ist_cyclk_h master latch 




Instruction queue read pointer 




ICTX master latch to S3 




Microstack pointer 



9.3 Initialization 

A reset (assertion of KJE%RESET_L) causes the microsequencer to initialize in the following state: 

• A powerup microtrap is initiated (see Table 9-12 for microtrap ABORT effects). 

• The microstack pointer is reset to zero. 

• The instruction queue valid bits are flushed and its pointers are reset by 
E JMSC%EARLY_FLUSH_EB05LH . 

• The Patchable Control Store CAM is disabled, since PCSCR<PCS_ENB> is cleared in the 
Ebox. 



9-24 The Microsequencer DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



• The MIB scan chain is controlled by the Parallel Test Port command pins, since 
PCSCR<PAR_PORTJDIS> is cleared in the Ebox. 

• The lest Address Generator is reset to an address value of zero. 

9.4 Microcode Restrictions 

1. Every microtrap except Branch Mispredict must contain a RESET.CPU in order to reset the 
Instruction Queue. (The Ebox is flushed automatically, clearing the queues, on detection 
of branch mispredict.) RESET.CPU must not be issued within the 3 microwords preceding 
LAST. CYCLE in order to allow time for the Instruction Queue to be cleared (if RESET.CPU 
is present in microword N, LAST. CYCLE cannot be present until microword N+4). 

2. For correct operation of Trace Fault and First Part Done in the Last Cycle Logic, 
PSL<T,TP,FPD> must not be changed within the 2 microwords preceeding LAST. CYCLE (if 
any of these PSL bits are changed in microword N, LAST CYCLE cannot be present until 
microword N+3). 

3. No Ebox-initiated memory requests can be made in the last cycle of a micro-flow, other than 
writes with the translation already known to be valid. 

4. No Ebox-initiated memory requests can be outstanding when the microcode references an 
operand (queue entry or register file location). 

5. The instruction queue stall microword must indicate LAST. CYCLE. 

6. PSL<TP> must be cleared by the interrupt handler before it allows execution of an interrupted 
instruction to resume. 

7. The Patchable Control Store (PCS) WRITE command, issued by writing a "1" into 
PCSCR<PCS_WRITE> in microinstruction N, must not be followed by a PCS ENABLE 
command (issued by writing a "1" into PCSCR<PCS_ENB>) before microinstruction N+2. 

8. Following the writing of the Patchable Control Store ENABLE bit (PCSCR<PCS_ENB>) in 
S5 by microinstruction N, the first microinstruction for which Patchable Control Store can be 
considered enabled is microinstruction N+4. 

9. The First Part Done microflow must check for the case in which an unimplemented instruction 
begins execution with PSL<FPD> set. In this case, microcode must branch to the Reserved 
Instruction Fault microflow, rather than executing the normal First Part Done microflow. 

9.5 Testability 
9.5.1 Test Address 

The control store microaddress is both controllable and observable. A microcode address can be 
driven to the microsequencer from the Test Address Generator. The Test Address Generator is an 
11-bit counter which is initialized to a value of zero on assertion of K_E%RESET_L. It increments 
its address counter once on each deassertion of T%CS_TEST_H, thus cycling through all possible 
control store addresses. 

This microaddress source takes priority over all others, lb ensure immediate control store 
lookup using this microaddress, assertion of T%CS_TEST_H sets an S/R latch whose output is 
E_USQ_TST%FORCE_TEST_ADDR_L. Assertion of this signal breaks any stall on #2, #3, and # 4 
latches in the microsequencer. This allows the control store to operate, driving the selected 
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microword into the MIB scan chain (see Section 9.5.2). The Ebox stall(s), if any, are unaffected, 
along with stalls on #i latches in the microsequencer. 

E_USQ_TST%FORCE_TEST_ADDR_L is deasserted when the Test Address Generator has completed 
generation of all possible addresses (when its counter overflows). 

The microaddress driven from the CAL can be be observed on the Parallel Test Port data pins 
under control of the Parallel lest Port command pins. The microsequencer drives to the Parallel 
Test Port in * 1# 

Figure 9-6: Parallel Port Output Format 



11 10 09 08|07 06 05 04|03 02 01 
+ — + — + — + — + — + — + — + — + — + — + — + 

I CAL OUTPUT | 

+ — + — + — + — + — + — + — + — + — + — + — + 



Table 9-20: Parallel Port Output Format Field Definitions 

Name Extent Description 

CAL OUTPUT 11:1 Microaddress driven from CAL 



9.5.2 MIB Scan Chain 

A 92-bit scan chain is present at the output of the MIB, allowing the complete microword to be 
latched and scanned out of the chip. The scan chain master latches operate in #4; the slave 
latches operate in $2* I* 1 observe mode, the scan chain is loaded and shifted under control of the 
Parallel Test Port command pins. When scanning out, MIB scan chain bit<91> is the first bit to 
reach the Parallel Test Port. 

Note that control of the MIB scan chain must be given to the parallel port during this operation, 
by writing a 0 to PCSCR<PAR_PORT_DIS>. See Section 8.5.22.1 for details. 



Table 9-21 : Contents of MIB Scan Chain, in Observe Mode 



Position 


Description 


Comment 


<91> 


E_USQ«MEBJS<0> 


Microword Field BRANCH. OFFSET 1 


<90> 


K_USQ%MEB_H<1> 




<89> 


K_USQ%MIB ^H<2> 




<88> 


K_USQ%MTB_H<3> 




<87> 


K_USQ*MIB_Jf<4> 




<86> 


K_USQ«MIBja<6> 




<85> 


E_USQ%MIB_H<8> 




<84> 


K_USQ%MIB_H<7> 





1 See Chapter 6 for details on microword fields. 
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Table 9-21 (Cont): Contents of MIB Scan Chain, in Observe Mode 



Position 



Description 



Comment 



<83> 
<82> 
<81> 
<80> 
<79> 
<78> 
<77> 
<76> 
<75> 
<74> 
<73> 
<72> 
<71> 
<70> 
<69> 
<68> 
<67> 
<66> 
<65> 
<64> 
<63> 
<62> 
<61> 
<60> 
<59> 
<58> 
<57> 
<56> 
<55> 
<54> 
<53> 
<52> 
<51> 
<50> 
<49> 
<48> 



E_USQ%MIB_H<34> 
E_USQ%MIB_H<48> 
E_USQ%MEB_H<48> 
E_USQ%MIB_H<47> 
K_USQ**nB_H<4«> 
E_USQ%MIB_H<*0> 
E_USQ%MIB_H<1»> 
K_U8Q%MIB_P<18> 
E_USQ%MIBH<17> 
E_U8Q%MIBJH<16> 
E_USQ%MIBja<15> 
E_USQ%MEB_H<31> 
E_USQ%MIB_H<30> 
E_USQ**HBja<2»> 
E_USQ%BIIB_H<28> 
E_USQ%MIB_H<27> 
E_USQ^MIB_H<a8> 
E_USQ%MIB_H<3S> 
E_US(t%MIB_H<a4> 
E_USQ%MIB_P<2S> 
E_USQ*MIB_H<3a> 
E_USQ%MIB_H<21> 
E_USQ%MIB_H<aO> 

Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 
Value Undefined 

E_USQ%MEB_H<10> 
E_USQ%MIB_H<»> 



Microword Field L 
Microword Field MI SCI 



Microword Field FMT 
Microword Field MISC 



Microword Field DST 



Microword Field A 



No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
No Observe Input 
Microword Field SEQ.COND 
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Table 9-21 (Cont.): Contents of MIB Scan Chain, in Observe Mode 



Position 


Description 


Comment 


<47> 


E_USQ%MIB_H<8> 




<46> 


K_U8Q%MrB_P<14> 


Microword Field SEQ.FMT 


<45> 


E_USQ%MIB_H<13> 


Microword Field SEQ.CALL 


<44> 


K_USQ%MIB_H<12> 


Microword Field SEQ.COND 


<43> 


E_USQ%MIB_H<11> 




<42> 


E_USQ9AOB_H<39> 


Microword Field B 


<41> 


E_U8Q%MIB_H<38> 




<40> 


E_USQ%MIB _H«37> 




<39> 


E_USQ%MIB_H<38> 




<38> 


E_USQ%MIB_H<3S> 




<37> 


E_USQ%UIBJH<44> 


Microword Field MISC2 


<36> 


E_USQ%MD3_H<43> 




<35> 


E_USQ%MIB_H<42> 




<34> 


E_USQ%MIB_H<41> 




<33> 


E_USQ%MIB_H<46> 


Microword Field LIT 


<32> 


E_USQ%MIB_H<40> 


Microword Field D 


<31> 


E_USQ%MTB_H<M> 


Microword Field MRQ 


<30> 


E_USQ%MXBJB<58> 




<29> 


E_USQ%MIB_H.eS2> 




<28> 


E_USQ«fcMIB_H<51> 




<27> 


E_USQ%MIB_H<SO> 




<26> 


E_USQ%MTB_H<33> 


Microword Field W 


<25> 


E_USQ%MIB_H<32> 


Microword Field V 


<24> 


E_USQ%MTB JH<58> 


Microword Field ALU 


<23> 


E_USQ%MIBJB<58> 




<22> 


E_USQ%MEB_H<B7> 




<21> 


E_USQ%MIBja<56> 




<20> 


E_USQ%MEB_H<56> 




<19> 


Value Undefined 


No Observe Input 


<18> 


Value Undefined 


No Observe Input 


<17> 


Value Undefined 


No Observe Input 


<16> 


Value Undefined 


No Observe Input 


<15> 


Value Undefined 


No Observe Input 


<14> 


Value Undefined 


No Observe Input 


<13> 


Value Undefined 


No Observe Input 


<12> 


Value Undefined 


No Observe Input 
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Table 9-21 (Cont.): Contents of MIB Scan Chain, in Observe Mode 



Position Description Comment 



<11> 


Value Undefined 


No Observe Input 


<10> 


Value Undefined 


No Observe Input 


<9> 


Value Undefined 


No Observe Input 


<8> 


Value Undefined 


No Observe Input 


<7> 


Value Undefined 


No Observe Input 


<6> 


Value Undefined 


No Observe Input 


<5> 


Value Undefined 


No Observe Input 


<4> 


Value Undefined 


No Observe Input 


<3> 


Value Undefined 


No Observe Input 


<2> 


Value Undefined 


No Observe Input 


<1> 


Value Undefined 


No Observe Input 


<0> 


Value Undefined 


No Observe Input 
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9.6 Signal Cross Reference 

Note that the signal names used in this specification are the schematic signal names. 



Table 9-22: Schematic Signal Names, in Alphabetical Order 



Schematic Signal Name 



Behavioral Model Signal Name 



E%FB0X_1ST_CYCLE_L 

E%FOPCODE_H 

E%INT_REQ_H 

E_BUS%UTEST_L 

E_FLT%FLOATING_FAULT_H 

E_FLT%HW_ERR_H 

E_FLT%IOVFL_L 

E_FLT%MME_ERR_H 

E_FLT%RSVD_ADDR_MODE_H 

E_FLT%RSVD_INSTR_L 

E_MSC%EARLY_FLUSH_EBOX_H 

E_PSL%BRANCH_MISPREDICT_H 

E_PSL%PSL_H 

E_STL%USEQ_STALL_H 

E_TIM%S3_TIMEOUT_H 

E_USQ%ICTX_H 

E_USQ%IQ_STALL_H 

E_USQ%MACR0_1ST_CYCLE_H 

E_USQ%MIB_H 

E_USQ%PE_ABORT_L 

E_USQ%UTSEL_H 

E_USQ_BUS%CAL_INPUT_L 

E_USQ_CAL%CAL_H 

E_USQ_CSM%UMIB_H 

E_USQ_INQ%IQ_OUT_H 

E_USq_LST%SELECT_iq_H 

E_USQ_STL%LATE_USQ_STALL_L 

E_USQ_STL%VERY_LATE_USQ_STALL_L 

E_USQ_TST%FORCE_TEST_ADDR_L 

I%IMEM_MEXC_H 

I%IQJBUS_H 



E%FB0X_1ST_CYCLE_L 

E%FOPCODE_H 

E%INT_REQ_H 

E_BUS%UTEST_H 

E%FLOATING_FAULT_H 

E%HW_ERR_H 

E%IOVFL_H 

E%MME_ERR_H 

E%RSVD_ADDR_MODE_H 

E%RSVD_ENSTR_FAULT_H 

E_MSC%EARLY_FLUSH_EBOX_H 

E%BRANCH_MISPREDICT_H 

E_PSL%PSL_H 

E_STL%USEQ_STALL_H 

E_TIM%S3_TIMEOUT_H 

E_USQ%ICTX_H 

E_USQ%iq_STALL_H 

E_USQ%MACR0_1ST_CYCLE_H 

E_USQ%MIB_H 

E_USQ%PE_ABORT_H 

E_USQ%UTSEL_H 

E_USq_BUS%CAL_INPUT_L 

E_USQ_CAL%CAL_H 

E_USQ_CSM%UMIB_H 

E_USQ_INQ%IQ_OUT_H 

E_USQ_LST%SELECT_IQ_H 

E_USQ_STL%LATE_USQ_STALL_L 

E_USQ_STL%VERY_LATE_USQ_STALL_L 

E_USQ_TST%FORCE_TEST_ADDR_L 

I%IMEM_MEXC_H 

I%IQ_BUS_H 
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Table 9-22 (Cont.): Schematic Signal Names, in Alphabetical Order 



Schematic Signal Name 



Behavioral Model Signal Name 



KJE%RESET_L 
M%MME_TRAP_L 
M%TB_PERR_TRAP_L 
T%CS_TEST_H 



K%RESET_L 
M%MME_TRAP_H 
M%TB_PERR_TRAP_H 
T%CS_TEST_H 



Table 9-23: Behavioral Model Signal Names, in Alphabetical Order 



Behavioral Model Signal Name 



Schematic Signal Name 



E%BRANCH_MISPREDICT_H 

E%FB0X_1ST_CYCLE_L 

E%FLOATING_FAULT_H 

E%FOPCODE_H 

E%HW_ERR_H 

E%INT_REQ_E 

E%I0VFL_H 

E%MMEJERR_H 

E%RSVD_ADDR_MODE_H 

E%RSVD_INSTR_FAULT_H 

E_BUS%UTEST_H 

E_MSC%EARLY.FLUSH_EBOX_H 

EJPSL%PSL_H 

E_STL%USEQ_STALL_H 

E_TIM%S3_TIMEOUT_H 

E_USQ%IQ_STALL_H 

E_USQ%MACRO_lST_CYCLE_H 

E_USQ%MIBJE 

E_USQ%PE_ABORT_H 

E_USQ%UTSEL_H 

E_USQ_BUS%CAL_INPUT_L 

E_USQ_CAL%CAL_H 

E_USQ_CSM%UMIB_H 

E_USQ%ICT3LH 

E_USQ_INQ%IQ_OUT_H 

E_USQ_LST%SELECT_IQ_H 

E_USQ_STL%LATE_USQ_STALL_L 

E_USQ_STL%VERY_LATE_USQ_STALL_L 



E_PSL%BRANCH_MISPREDICT_H 

E9tFBOX_lST_CYCLE_L 

E_FLT^FLOATING_FAULT_H 

E%FOPCODE_H 

E_FLT9rHW_ERR_H 

E%INT_REQ_H 

EJFTJT&IOVFL.L 

E_FLT%MME_.ERR_H 

E_FLT9cRSVD_ADDR_MODE_H 

E_FLT9cRSVD_INSTR_L 

E_BUS%UTEST_L 

E_MSC%EARLY_FLUSH_EBOX_H 

E_PSL%PSL_H 

E_STL%USEQ_STALL_H 

E_TB1%S3_TIME0UT_H 

EJJSQ%IQJ3TALL_H 

E_USQ%MACR0_1ST_CYCLE_H 

E_USQ%MIB_H 

E_USQ%PE_ABORT_L 

E_USQ%UTSEL_H 

E_USQ_BUS%CAL_INPUT_L 

E_USQ_CAL%CAL_H 

E_USQ_CSM%UMTB_H 

E_USQ%ICnLH 

E_USQ_INQ%IQ_OUT_H 

E_USQ_LST%SELECT_IQ_H 

E_USQ_STL%LATE_USQ_STALL_L 

E_USQ_STL%VERY_LATE_USQ_STALL_L 
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Table 9-23 (Cont.): Behavioral Model Signal Names, In Alphabetical Order 

Behavioral Model Signal Name Schematic Signal Name 

E_USQ_TST%FORCE_TEST_ADDR_L E_USQ_TST%FORCE_TEST_ADDR_L 

I%IMEM_MEXC_H I%IMEM_MEXC_H 

I%IQ_BUS_H I%IQ_BUS_H 

K%RESET_L K_E%RESET_L 

M%MME_TRAP_H M%MME_TRAP_L 

M%TB_PERR_TRAP_H M%TB_PERR_TRAP_L 

T%CS_TEST_H T%CS_TEST_H 
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9.7 Revision History 



Table 9-24: 


Revision History 






Rev 


Who 


When 


Description of change 


0.0 


Elizabeth M. Cooper 


06-Mar-1989 


Release for external review. 


0.1 


Elizabeth M. Cooper 


14-Sep-1989 


Post-modelling update. 


0.5 


Elizabeth M. Cooper 


10-Dec-1989 


Updates for Rev 0.5 spec release. 


0.5A 


Elizabeth M. Cooper 


5-Jan-1990 


Remove vector microtrap and V bit 








from IQ. 


0.5B 


Elizabeth M. Cooper 


20-Jun-1990 


Accumulated updates. 


0.6A 


Elizabeth M. Cooper 


26-Nov-1990 


Final updates. 


0.6B 


Elizabeth M. Cooper, 


12-Dec-1990 


Final final updates. 




Tim C. Fischer 






0.6C 


Elizabeth M. Cooper 


l-Jan-1991 


Add signal cross reference tables. 


0.6D 


Elizabeth M. Cooper 


13-Feb-1991 


Add description of patch revision. 
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Chapter 10 

The Interrupt Section 



1 0.1 Overview 

The interrupt section receives interrupt requests from both internal and external sources, and 
compares the IPL associated with the interrupt request to the current interrupt level in the PSL. If 
the interrupt request is for an IPL that is higher than the current PSL IPL, the interrupt section 
signals an interrupt request to the microsequencer which will initiate a microcode interrupt 
handler at the next macroinstruction boundary. 

When an interrupt is serviced by the Ebox microcode, the interrupt section provides an encoded 
interrupt ID on EJ3US%ABUS_L<20:16>, which allows the microcode to determine the highest pri- 
ority interrupt request that is pending. Interrupt requests are cleared in one of three ways, 
depending on the type of request. 

Software interrupt requests are supported via a 15-bit SISR register, which is read and written 
by the microcode, and which makes requests to the interrupt generation logic. 

Both full and subset interval timer support is provided, based on the state of the ICCSJEXT bit 
in the ECR processor register, as described in Section 8.5.22. If ECR<ICCSJ3XT>=0, a subset 
interval timer is supported by implementing the interrupt enable bit of the IOCS processor reg- 
ister in internal logic. If ECR<ICCSJEXT>=1, a full interval timer is supported, and external 
logic must implement the full ICCS, ICR, and NICR processor registers. In this instance, reads 
from and writes to these registers are converted to I/O space addresses and transmitted off-chip, 
as described in Section 2.12, Processor Registers. 

10.2 Interrupt Summary 

Interrupt requests received from external logic are divided into two categories: those received by 
edge-sensitive logic, and those received by level-sensitive logic. Both are synchronized to internal 
clocks. In addition, there are several internal sources of interrupt requests. 
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10.2.1 External Interrupt Requests Received by Edge-Sensitive Logic 

Five of the external interrupt requests are received by edge-sensitive logic and synchronized to 
internal clocks. These signals request the following special-purpose interrupts. 

• P%HALTJL: The assertion of P%HALT_L causes the CPU to enter the console at IPL IF 
(hex) at the next macroinstructdon boundary. This interrupt is not gated by the current 
IPL, and always results in console entry, even if the IPL is already IF (hex). Note that the 
implementation of this event is different from a normal interrupt in which a PC/PSL pah- 
are pushed onto the interrupt stack. For this event, the current PC, PSL, and halt code 
are stored in the SAVPC and SAVPSL processor registers. The mechanism by which the 
console is entered, and a description of the SAVPC and SAVPSL processor registers is given 
in Section 15.4, Console Halt and Halt Interrupt. 

• P%PWKFLJL: The assertion of P%PWRFL_L indicates that a power failure is pending. 
This results in the dispatch of the interrupt to the operating system at IPL IE (hex) through 
SCB vector 0C (hex). 

• P%HJEKR_L: The assertion of P9eH_EKRJL indicates that a hard error has been detected 
in the system environment. This results in the dispatch of the interrupt to the operating 
system at IPL ID (hex) through SCB vector 60 (hex). 

• P%S_ERRJL: The assertion of P%S_ERR_L indicates that a soft error has been detected in 
the system environment. This results in the dispatch of the interrupt to the operating system 
at IPL 1A (hex) through SCB vector 54 (hex). 

• P%INT_TTMJL: The assertion of P%INT_TTM_L indicates that the interval timer period has 
expired. If the interrupt enable bit in the ICCS processor register is set (whether this bit is 
implemented internally or externally), an interrupt is dispatched to the operating system at 
IPL 16 (hex) through SCB vector CO (hex). If ICCS<6> is not set, no interrupt is dispatched. 

Each signal must make a high-to-low transition to assert the interrupt request. A pseudo-edge 
detect circuit is used to capture this transition asynchronously. Details of the edge detect logic 
given in Section 10.3.1. Because these are special-purpose interrupt requests with an implied 
SCB vector, no acknowledgement of the interrupt is required. Ebox microcode explicitly clears 
the interrupt request when the interrupt is serviced. 

10.2.2 Externa! Interrupt Requests Received by Level-Sensitive Logic 

Four of the external interrupt requests are received by level-sensitive logic and synchronized to 



internal clocks. These signals request general-purpose interrupts at the following IPLs. 


Interrupt 


Request IPL 


Request 


(Hex) 


(Dec) 


P%IRQJL<3> 


17 


23 


P%IRQJL<2> 


16 


22 


P%IRQ_L<1> 


15 


21 


P%IRQ_L<0> 


14 


20 



Each signal must be driven low and remain low to assert the interrupt request. When one of 
these interrupts is to be serviced, the Ebox microcode acknowledges the interrupt by issuing an 
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NDAL read of word length to one of four longword-aligned interrupt vector offset registers to 
obtain the SCB offset through which the interrupt should be dispatched. The address of the 
register depends on the interrupt being serviced, as shown in Table 10-1. 



Table 10-1: 


Interrupt Vector Offset Registers 


Interrupt 


Vector Offset 


Processor 


Request 


Register Address 


Register 1 


P%IRQ_L<3> 


E100010C 


IAK17 


P%ERQ_L<2> 


E1000108 


IAK16 


P%IRQ_L<1> 


E1000104 


IAK15 


P%IRQ_L<0> 


E1000100 


IAK14 



1 Direct access to the interrupt vector offset registers is provided via processor register reads for system test Software 
references to these processor registers during normal system operation can result in UNDEFINED behavior 



In response, the microcode expects to receive an interrupt SCB vector offset, which is shown in 
Figure 10-1. The fields are described in Table 10-2. 

Figure 10-1 : Interrupt SCB Vector Offset 



31 30 29 28127 26 25 24123 22 21 20119 15 17 16)15 14 13 12 111 10 OS 08 | 07 06 05 04|03 02 01 00 
Ixxiixxzxjrxxrzxxzxl System Control Block Offset | PR I IX- 1 :IAKlx 



Tabie 10-2: interrupt SCB Vector Offset 

Name Extent Description 

IL 0 Interrupt Level Override. In normal operation, the IPL at which the interrupt is 

serviced is implied by the request signal that was asserted. If the IL bit is set in the 
interrupt vector offset, the IPL at which the interrupt is taken is forced to 17 (hex). 
This capability supports external buses, such as the Q-bus, that can not guarantee 
that the device that responds with the interrupt SCB vector offset is the device that 
originally requested the interrupt. 

For example, the Q-bus has four separate interrupt request signals that correspond 
to P%IRQ_L<3 :0> but only one signal to daisy chain the interrupt grant. 
Furthermore, devices on the Q-bus are ordered so that higher priority devices are 
electrically closer to the bus master. If an P%ERQJL<1> request is being serviced, 
there is no guarantee that a higher priority device will not intercept the grant. 
Software must determine the level of the device that was serviced and set the IPL 
to the correct value. 
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Table 10-2 (Cont.): interrupt SCB Vector Offset 

Name Extent Description 

PR 1 Passive Release Flag. In certain circumstances, notably in multi-processor 

configurations, an interrupt may be requested but removed by the time the microcode 
acknowledges it by reading the interrupt vector offset register. If the PR bit is set 
in the interrupt SCB vector offset, the microcode treats tins interrupt as an internal 
passive release and resumes the interrupted instruction stream without dispatching 
the interrupt. 

If the interrupt request is deasserted before the microcode reads the interrupt ID, 
the ID will be zero, indicating that no interrupt is pending. In that instance, no 
read of the interrupt vector offset register is done, and the microcode generates an 
immediate passive release. 

15:2 Longword offset from the start of the SCB of the vector to use to dispatch this 
interrupt. After zero-extending to longword length, microcode adds this value to the 
contents of the SCBB register, reads that location, and uses it as the SCB vector 
with which to dispatch the interrupt to the operating system. 

NOTE 

If both the PR and IL bits are set in the interrupt SCB vector offset, the PR bit takes 
priority and a passive release is done. 



10.2.3 Internal Interrupt Requests 

The Cbox, Ibox, and Mbox report error conditions by asserting internal interrupt request signals 
that are logically ORed with the synchronized versions of P%H_ERR_L and P%SJEKRJL. These 
requests are then handled in exactly the same manner as requests generated by external sources, 
as specified above. The following table details the internal interrupt sources 



Table 10-3: internal Interrupt Requests 


Signal 


Source 


Type 


C%GBOX_H_]£HRJB 


CBOX 


H_ERR_L 


C%CBOX w S_KR£J3 


CBOX 


S_ERR_L 




IBOX 


S.ERRJL 


M%MBOX_S_ERROB„H 


MBOX 


S_ERR_L 



The performance monitoring facility requests an interrupt at IPL IB (hex) when the 
performance counters become half full. The performance monitoring hardware asserts the signal 
EJPMN%PMON_L to perform this request. This request is serviced entirely by microcode, and 
cleared by writing to the appropriate bit in the ISR. Chapter 18 should be consulted more details 
about the Peformance Monitoring facilities. 
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Architecturally defined software interrupt requests are implemented through an internal register 
in the interrupt section. Under control of the SISR and SIRE processor registers which are 
described in Chapter 2, the Ebox microcode sets the appropriate bit in this register, which then 
results in the dispatch of the interrupt to the operating system at an IPL and through the SCB 
vector implied by the interrupt request. The association between the interrupt request, requested 
IPL, and SCB vector for these requests is shown in the following table. 



Table 10—4: 


Software Interrupts 






Request IPL 


SCB Vector 


SISR bit 


(Hex) 


(Dec) 


(Hex) 


SISR<15> 


OF 


15 


BC 


SISR<14> 


0E 


14 


B8 


SISR<13> 


OD 


13 


B4 


SISR<12> 


OC 


12 


BO 


SISR<11> 


OB 


11 


AC 


SISR<10> 


OA 


10 


A8 


SISR<09> 


09 


09 


A4 


SISR<08> 


08 


08 


AO 


SISR<07> 


07 


07 


9C 


SISR<06> 


06 


06 


98 


SISR<05> 


05 


05 


94 


SISR<04> 


04 


04 


90 


SISR<03> 


03 


03 


8C 


SISR<02> 


02 


02 


88 


SISR<01> 


01 


01 


84 



Ebox microcode explicitly clears the interrupt request when the interrupt is serviced. 



10.2.4 Special Considerations for Interval Timer Interrupts 

The NVAX CPU may be configured to support either a subset interval timer, or a full interval 
timer, depending on the state of ECR<dCCS_EXT>, as described in Section 8.5.22, Ebox IPRs. 
Console firmware initializes this bit to the correct state based on the system environment in 
which the CPU chip is used. 

The internal implementation of the interval timer interrupt request gates the assertion of 
P%INT_TiMJL with the internal copy of the interrupt enable bit of the ICCS processor register 
(ICCS<6>). The CPU chip does not know the source of the signal driving P%1NT_TTMJL, and 
this fact is used to allow the implementation of both a subset and full interval timer. 

If ECR<ICCS_EXT>=0, an SRM-approved subset interval timer may be implemented by driving 
P%INT_TlM_L with an oscillator whose period is 10ms. In this mode, the NICR and ICR 
processor registers are not required nor implemented, and microcode maintains the subset ICCS 
processor register with an internal copy of only the interrupt enable bit from ICCS<6>. References 
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to the ICCS processor register affect only ICCS<6>, and are handled internally without being 
transmitted on the NDAL. 

If ECR<ICCS_EXT>=1, a full interval timer consisting of the ICCS, NICE, and ICR processor 
registers may be implemented in external logic. P%INT_TIM_L is asserted when the 
programmed interval has expired. Processor register references to the ICCS, NICR, and ICR 
processor registers are converted to I/O space references and transmitted onto the NDAL, as 
described in Section 2.12, Processor Registers. However, even in this mode, microcode maintains 
the internal copy of ICCS<6> consistent with a write to ICCS that is transmitted onto the NDAL. 
As a result, if interrupts are enabled in the off-chip ICCS register, they are also allowed by the 
internal ICCS interrupt enable bit. Conversely, if interrupts are disabled in the off-chip ICCS 
register, they are also disabled by the internal bit. External logic is expected to return all 32 
bits when the ICCS processor register is read, including the correct state of the interrupt enable 
bit. Microcode does not attempt to merge the external data with the internal copy of ICCS<6> to 
satisfy a processor register read of ICCS. 

It should be noted that ECR<ICCS_EXT> has no effect on the operation of the interrupt section 
hardware. It is used strictly as a control bit which directs the microcode operation of references 
to the ICCS processor register. Independent of the state of ECR<ICCSJEXT>, processor register 
writes to ICCS cause microcode to update the internal copy of the interrupt enable bit. If 
ECR<ICCS_EXT>s=l, references to the ICCS processor register are also transmitted onto the 
NDAL. References to the NICR and ICR processor registers are always transmitted onto the 
NDAL; they are simply not used if the system implements a subset interval timer. 

Table 10-5 gives a summary of the results of references to the ICCS, NICR, and ICR processor 
registers, with both states of ECR<ICCS_EXT>. 



Table 10-5: References to Interval Timer Processor Registers 



Operation 



ECR<ICCS_EXT>»0 



ECR<ICCS_EXT>«1 



MTPR x,#PR$_ICCS 

MFPR #PR$_ICCS^ 
MTPR x,#PR$_NICR 
MFPR #PR$_NICfU 
MTPR x,#PR$_ICR 
MFPR #PR$_ICR,x 



Update internal ICCS<6> 

Return internal ICCS<6> 

Write data to E1000064 1 

Read and return data from E1000064 1 

Write data to E1000068 1 

Read and return data from E1000068 1 



Update internal ICCS<6>, write data to 
E1000060 1 

Read and return data from E1000060 1 

Write data to E1000064 1 

Read and return data from E1000064 1 

Write data to E1000068 1 

Read and return data from E1000068 1 



1 See Section 2.12 



10.2.5 Priority of Interrupt Requests 

When multiple interrupt requests are pending, the interrupt section prioritizes the requests. 
Table 10-6 shows the relative priority (from highest to lowest) of all interrupt requests. For 
reference, this table also includes the IPL at which the interrupt is taken, and the SCB vector 
through which the interrupt is dispatched. 
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Table 1 0-6: Relative Interrupt Priority 


Interrupt 


Request IPL 


SCB Vector 




Bequest 


(Hex) 


(Dec) 


(Hex) 




P%HALTJL 


IF 


31 


None 1 


Highest priority 


P%PWRFL_L 


IE 


30 


OC 




P%H_ERR_L 2 


ID 


29 


60 




EJPMN%PMON_L 


IB 


27 


None 5 




P%S_EER_L 2 


1A 


26 


54 




P%IRQ_L<3> 


17 


23 


Specified by device 3 




P%IRQ_L<2> 


16 


22 


Specified by device 3 




P%INT_TTM_L 4 


16 


22 


CO 




P%IRQ_L<1> 


15 


21 


Specified by device 3 




P%IRQ_L<0> 


14 


20 


Specified by device 3 




SISR<15> 


OF 


15 


BC 




SISR<14> 


0E 


14 


B8 




SISR<13> 


0D 


13 


B4 




SISR<12> 


OC 


12 


B0 




SISR<11> 


OB 


11 


AC 




SISR<10> 


OA 


10 


A8 




SISR<09> 


uy 


AO 

uy 


A4 




SISR<08> 


08 


08 


AO 




SISR<07> 


07 


07 


9C 




SISR<06> 


06 


06 


98 




SISR<05> 


05 


05 


94 




SISR<04> 


04 


04 


90 




SISR<03> 


03 


03 


8C 




SISR<02> 


02 


02 


88 




SISR<01> 


01 


01 


84 


Lowest priority 



1 Direct dispatch to console; PC, PSL placed in SAVPC, SAVPSL processor registers 
2 IncludeB Cbox, Ibox, and Mbox internally generated requests 
S SCB vector offset supplied by the device 
4 When enabled by the internal ICCS<6> 
6 Intemipt processed entirely by microcode 



The P%IRQ_L<2> request takes priority over the P%INT_TTM_L request, both of which 
are at IPL 16 (hex). Inter-processor interrupts in multi-processor systems are requested via 
P%ERQ_L<2> , and they must take priority over interval timer requests. 
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10.3 Interrupt Section Structure 

The interrupt section consists of three basic components: the edge detect and synchronization 
logic, the interrupt state register (ISR), and the interrupt generation logic. A block diagram of 
the interrupt section is shown in Figure 10-2. 

Figure 10-2: Interrupt Section Block Diagram 
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10.3.1 Edge Detect and Synchronization Logic 

10.3.1.1 Edge Detect Circuitry 

The pads for the five special-purpose external interrupt request signals contain logic which detects 
high-to-low transitions on these signals. A falling edge sets an SR flip-flop which begins the 
interrupt request process. This interrupt request process involves setting another SR flip-flop 
to register the interrupt. This second flip-flop may only be cleared by microcode. Microcode 
clears this flip-flop while servicing the interrupt request. The edge detect circuitry resets itself 
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automatically (dealing the first SR) within two NDAL cycles following the low-to-high transition 
of the pin. 

1 0.3.1 .2 Interrupt Synchronization 

The pads for all external interrupt request signals (both the edge and level sensitive types) contain 
synchronizers to allow the use of asynchronous signals for interrupt requests. The pin signals 
are synchronized to the internal NVAX clocks and are then passed to the ISR. More deterministic 
timing behavior may be desired in some applications such as during test. This may be achieved 
by driving the signals synchronously with respect to the input clocks. The chapter on Electrical 
Characteristics should be consulted for details about setup and hold times. 

1 0.3.2 Interrupt State Register 

The interrupt state register is a composite register that implements the 15-bit architecturally 
defined SISR register, the internal copy of the interrupt enable bit from the ICCS processor 
register, the interrupt latch for the performance monitoring facility interrupt, and the interrupt 
request latches for the 5 special-purpose and 4 general-purpose interrupts. The ISR contains two 
kinds of elements: SR flops for the special-purpose interrupt requests, and latches for the other 
requests. The following table lists the types and positions of all elements in the ISR. 





State 




ISR bit 


Element 


Description 


31 


SR 


Interrupt request for P%HALT_L interrupt 


30 


SR 


Interrupt request for P%PWRFL_L interrupt 


29 


SR 


Interrupt request for P%H_ERR_L and internal hard error interrupts. 


28 


SR 


Interrupt request for k_pmn%pmon_l, the performance monitoring facility 
interrupt 


27 


SR 


Interrupt request for P%S_ERR_L and internal soft error interrupts 


26 


L 


Interrupt request for P%IRQJL<3> interrupt 


25 


L 


Interrupt request for P%IRQ_L<2> interrupt 


24 


SR 


Interrupt request for P%INT_TTM_L interrupt 


23 


L 


Interrupt request for P%IRQ_L<1> interrupt 


22 


L 


Interrupt request for P%ERQ_L<0> interrupt 


15:1 


L 


SISR<15:1> latches and requests for software interrupts 


0 


L 


Internal ICCS<6> latch 



State Element 

SR — SR flop 
I^-Latch 



Synchronized inputs from the external special-purpose interrupt requests are logically ORed with 
the internal requests from the Cbox, Ibox, and Mbox. The assertion of one of these signals causes 
the appropriate request flop to be set in ISR<31:29,27,24>. These request flops are cleared under 
Ebox microcode control when written with a 1 from the corresponding bits of EJBUS%WBUS_L. 
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Synchronized inputs from the general-purpose interrupt requests are loaded into the appropriate 
latch in ISR<26:25,23:22>. These request latches are cleared when the interrupting device 
deasserts the interrupt request in response to a CPU request for an interrupt vector offset. 

The performance monitoring facility interrupt request is loaded into the request nop in 
ISR<28>. The request is cleared under Ebox microcode control when written with a 1 from 
E_BUS%WBUS_L<28>. 

SISR<15:1> is implemented via ISR<15:1>, and is loaded from bits <15:1> of EJBUS%WBUS_L 
under Ebox microcode control. These request latches are cleared under Ebox microcode control 
when a new value is loaded from E_BUS%WBUS_L. 

The internal copy of the interrupt enable bit in the ICCS processor register (ICCS<6>) is 
implemented via ISR<0>, and is loaded from EJBUS%WBUS_L<o> under Ebox microcode control. 
Local logic gates the interval timer request from ISR<24> with the state of ISR<0>. 

The interrupt request elements of the interrupt state register (ISR<31:22,15:1>) go to the 
interrupt generation logic. ISR<0> and ISR<15:1> may also be read onto EJBUS%ABUS_L for 
return to the Ebox. 

10.3.3 Interrupt Generation Logic 

The interrupt generation logic priority encodes all interrupt requests from the interrupt state 
register to determine the highest priority request. The output of the encoder is the request IPL 
and the interrupt ID of the highest priority request. If any request is pending, the request IPL is 
compared against E_PSLfi»PSL_H<20:ie> from the Ebox. If the request IPL is higher than the PSL 
IPL, or if the request is for P%HALTJL (P%HALT_L is not gated by the IPL), E%INT_REQ_H is 
asserted to the microsequencer. 

The assertion of E%ENT_REQ_H causes the microsequencer to initiate a microcode interrupt handler 
at the next macroinstruction boundary. The same signal is available on the microtest bus 
( E_BUS%UTEST_L<o> as a microbranch condition, which is checked by the Ebox microcode during 
long instructions. 

Along with the request IPL, the interrupt generation logic provides an encoded interrupt ID 
that identifies the highest priority interrupt. The interrupt ID is read onto bits <20:16> of 
E_BUS%ABUS_L along with ISR<0> and ISR<15:1> when microcode references the A/INT.SYS 
source. For each interrupt, the interrupt ID encoding, request IPL, ISR bit number, method for 
clearing the interrupt, and SCB vector is shown in Table 10-7. 



Table 10-7: Summary of Interrupts 













ISR 




SCB 


Interrupt 


Int ID 


Request IPL 


Bit 


Reset 


Vector 


Request 


(Hex) 


(Dec) 


(Hex) 


(Dec) 


(Dec) 


Method 


(Hex) 


P%HALT_L 


IF 


31 


IF 


31 


31 


Write 1 to ISR bit 


Console 
















Halt 


P%PWEFL_L 


IE 


30 


IE 


30 


30 


Write 1 to ISR bit 


OC 


P%HJERR_L 1 


ID 


29 


ID 


29 


29 


Write 1 to ISR bit 


60 



1 Includes Cbox., Ibox, and Mbox internally generated requests 
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Table 10-7 (Cont): Summary of Interrupts 



Interrupt 


Int ID 


Request D?L 


ISR 
Bit 


Reset 


SCB 
Vector 


Request 


(Hex) 


(Dec) 


(Hex) 


(Dec) 


(Dec) 


Method 


(Hex) 


E_PMN%PMON_L 


IB 


27 


IB 


27 


28 2 


Write 1 to ISR bit 


Handled by 
microcode 


PWBJSRRJJ 


1A 


26 


1A 


26 


272 


Write 1 to ISR bit 


54 


P%ERQJx3> 


17 


23 


17 


23 


26 


Read IAK17 IPR 


Supplied by 
device 


P%IRQJL<2> 


16 


22 


16 


22 


25 


Read IAK16 IPR 


Supplied by 
device 


P%INT_TIM_L 


1C 


28 


16 


22 


24 2 


Write 1 to ISR bit 


CO 


P%IRQ_L<1> 


15 


21 


15 


21 


23 


Read IAK15 IPR 


Supplied by 
device 


P%ERQ_L<0> 


14 


20 


14 


20 


22 


Read IAK14 IPR 


Supplied by 
device 


OXOXVv. JL«J^ 


OF 


15 


OF 


15 


X3 


Write ft tn TC!"R Uif 


JDK/ 


STAR^IA-*. 


0E 


14 


0E 


14 


14. 

Xtc 


WritA ft tn TAR hit 

YVXXL45 V/ iAJ XOXV UXlf 


XjO 


OX OXvv. 10> 


0D 


13 


0D 


13 




Writ/* n tn TAP hit 




OX OXvv l£t^ 


OC 


12 


OC 


12 


19 


Write D te TAP Kit 
TTXIMS V/ VO XOXV L/Xl> 


Tift 


ATSR^IIs. 


OB 


11 


0B 


11 


11 

xx. 


Write ft te TAP Kit 


ac. 


OXOXVv. 1V/^ 


OA 


10 


OA 


10 


1ft 


Write O tn TAP Kit 


AA 


ATAR^ftQ** 


09 


09 


09 


09 


ftQ 


Write O te TAP Kit 


AA 


ATAR*-ftft^. 


08 


08 


08 


08 


na 

uo 


Write ft te TAP Kit 
VV 11 1x3 \J bU XOXV OIL 


Aft 


ATAT?,«-ft7N. 


07 


07 


07 


07 


ft7 

IS l 


Write ft te TAP Kit 
VYxxWS \J iAJ XOXV Olb 


QP 


OlOXV<U©> 


06 


06 


06 


06 


uo 


Write u to loxv Dlt 


Qfi 


SISR<05> 


05 


05 


05 


05 


05 


Write 0 to ISR bit 


94 


SISR<04> 


04 


04 


04 


04 


04 


Write 0 to ISR bit 


90 


SISR<03> 


03 


03 


03 


03 


03 


Write 0 to ISR bit 


8C 


SISR<02> 


02 


02 


02 


02 


02 


Write 0 to ISR bit 


88 


SISR<01> 


01 


01 


01 


01 


01 


Write 0 to ISR bit 


84 


No Interrupt 


00 


00 








Dismiss interrupt 





1 Includes Cbox, Ibox, and Mboz internally generated requests 

2 Write- 1-to-clear ISR bit is different than IPL and interrupt ID 

3 Interrupt JJD is different than IPL 



The interrupt ID is the same as the request IPL for all interrupt requests except for the interval 
timer request. 
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DESIGN CONSTRAINT 

A value of zero for the interrupt ID must be returned if an interrupt is no longer 
present, or if the highest priority interrupt request is no longer higher than the PSL 
IPL. Normally, once an interrupt request is made, it remains until it is cleared by the 
microcode. However, the level-sensitive interrupt requests may be deasserted after the 
interrupt is dispatched, but before the microcode reads the interrupt ID. Therefore, it is 
possible that the highest remaining interrupt has a request IPL lower than the current 
PSL IPL. If zero is not returned for the interrupt ID in this instance, the processor will 
not function correctly. 



10.4 Ebox Microcode Interface 

The Ebox microcode interfaces with the interrupt section primarily through reads (via 
E_BUS%ABUS_L) and writes (via !UBUS%WBUS_L) of the ISR accomplished through the A/INT.SYS 
and DST/INT.SYS decodes. These decodes provide access to the so-called INT.SYS register, which 
is shown in Figure 10-3. The fields of the register are listed in Table 10-8. 

Figure 10-3: IPR 7A (hex), INTSYS 



31 30 29 28127 26 25 24123 22 21 20|19 18 17 16 | 15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 



0| 0| | 0| 01 0| 



SISR<15:1> 



| : INTSYS 
-+ 



ICCS<6> 



I | + — INT_TIM_RESET 

I +-- S_EKR_RESET ~ 
+ — PMON_RESET 
+— H_ERR_RE SE T 
+ — PWRFL_KESET 
+ — HALT RESET 



Table 10-8: INTSYS Field Descriptions 

Name Extent Type Description 

ICCS<6> 0 RW,0 This field contains the internal copy of the interrupt enable bit from 

the ICCS processor register. It is set to 0 by microcode at powerup. 

SISR 15:1 RW,0 This field contains the 15 architecturally-defined software interrupt 

request bits. It is set to 0 by microcode at powerup. 

INT.ID 20:16 RO This field contains the encoding of the highest priority interrupt 

request as listed in Table 10-7. Writes to this field are ignored. 

INT_TIM_RESET 24 WC,0 Writing a 1 to this field clears the P%INT_T1ML interrupt request. 

Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 
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Table 10-8 (Cont.): INT.SYS Register Fields 



Name 


Extent 


Type 


Description 


S_ERR_RESET 


27 


WC,0 


Writing a 1 to this field clears the P%S_EERJL interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 


PMON_RESET 


28 


WC,0 


Writing a 1 to this field clears the e_pmn%hmon_l interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 


H_ERR_RESET 


29 


WC,0 


Writing a 1 to this field clears the P%H_ERR_L interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 


PWRFL.RESET 


30 


WC,0 


Writing a 1 to this field clears the P%PWRFL_L interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 


HALT.RESET 


31 


wc,o 


Writing a 1 to this field clears the P%HALTJL interrupt request. 
Writing a 0 has no effect on the request. The field is read as a 0 and 
the interrupt request is cleared by microcode at powerup. 



DESIGN CONSTRAINT 

When read onto EJBUS%ABUS_L, INT.SYS<31:27,24> must be zero. Microcode 
updates the internal copy of ICCS<6> and SISR<15:1> by reading the INT.SYS 
register,modifying the appropriate bits, and writing the updated value back- The 
write-one- to-clear bits must be read as zero because the microcode does not mask them 
out before writing them back. 

MICROCODE RESTRICTION 

The INT.SYS register is not bypassed. A write to INT.SYS in microinstruction n must 
not be followed by a read of INT.SYS sooner than microinstruction n+4. 

MICROCODE RESTRICTION 

Changes to machine state that affect the generation of interrupts (PSL<IPL>, ICCS<6>, 
or SISR<15:1>) done by microinstruction n must not be followed by a LAST CYCLE 
microinstruction sooner than microinstruction n+4 if the change is to be observed by 
the next macroinstruction. 

10.5 Processor Register Interface 

Software can interact with the interrupt section hardware and microcode via references to 
processor registers, as follows: 

• ICCS: References to the ICCS processor register allow access to the copy of ICCS<6> that is 
implemented in INT.SYS<0>, as described in Section 10.2.4. 

• NICR, ICR: References to the NICR and ICR processor registers are transmitted off-chip to 
an optional full interval timer implementation as described in Section 10.2.4. 

• SISR, SIRR: References to the architecturally-denned SISR and SIRR processor registers 
allow access to SISR<15:1>, which are implemented in INT.SYS<15:1>. 
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• ECR: References to ECR<ICCS_EXT> select the interval timer configuration, as described in 
Section 10.2.4. 

• IAK14, IAK15, IAK16, IAK17: Reads of tbe IAK processor registers allow diagnostic and test 
software direct access to device interrupt vectors, as described in Section 10.2.2. References 
to these processor registers during normal system operation can result in UNDEFINED 
behavior. 

• INTSYS: References to the INTSYS processor register allow diagnostic and test software 
direct access to the INT.SYS register. Reads of the INTSYS processor register return the 
format shown in Figure 10—3. Writes of the INTSYS processor register are internally masked 
by microcode such that only the left halt write- to-clear bits are written. Other bits remain 
unchanged. Writes to the INTSYS processor during normal system operation can result in 
UNDEFINED behavior. 

10.6 Interrupt Section Interfaces 

10.6.1 Ebox Interface 

10.6.1.1 Signals From Ebox 

• E_BUS%WBUS_L: Write data bus, from which ICCS<6> and SISR<15:1> are loaded, and from 
which the write-one- to-clear interrupt latches are cleared. 

• E_PMN<TcPMON_L: Performance monitoring facility interrupt request. 

• E_PSL%PSL_H<20:16>: IPL field from the current PSL. 

• E_STL9cF_NOP_S5_H: Force a NOP into S5 of the MIB decode pipe when an S3 or S4 stall exists 

• E_STL%LATE_F_NOP_S4_H: Force a NOP into S4 of the MIB decode pipe when an S3 stall exists 

• E_STL%LATE_STALL_S4_H : Stall the MIB decode pipe when an S4 stall exists 

1 0.6.1 .2 Signals To Ebox 

• E_BUS%ABUS_L: A-port operand bus, on which ICCS<6>, SISR<15:1>, and the interrupt ID 
are returned. 

10.6.2 Microsequencer Interface 
10.6.2.1 Signals from Microsequencer 

• E_USQ%MT£_Hc3i ao> : MIB lines used to decode the writes/reads to INT.SYS 

• E_USQ%MEB_L<31:20>: MIB lines used to decode the writes/reads to INT.SYS 

• E_USQ%UTSEX_H<4K>> : Microtest bus select code. 

• E_USQ%UTSEL_L<4:0> : Microtest bus select code. 
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10.6.2.2 Signals To Microsequencer 

• E%Esrr_EEQ_H: Interrupt pending. 

• EJBUS%UTESTJL<0>: Microtest bus. 

10.6.3 Cbox interface 
10.6.3.1 Signals From Cbox 

• C7oCBOX^H_ERR_H: Hard error interrupt request. 

• C%CBOXj5_ERR_H: Soft error interrupt request. 

10.6.4 Ibox interface 
10.6.4.1 Signals From ibox 

• K£IBOX_S_ERR_L: Soft error interrupt request. 

10.6.5 Mbox Interface 
10.6.5.1 Signals From Mbox 

• M^MBOX_S_ERROR_H: Soft error interrupt request. 

10.6.6 Pin Interface 
10.6.6.1 Input Pins 

• P%HALTJL: Special-purpose halt "interrupt" signal, sampled by edge-sensitive logic. 

• P%H_EKEt_L: Special-purpose hard error interrupt signal, sampled by edge-sensitive logic. 

• P%INT_TIMJL: Special-purpose interval timer interrupt signal, sampled by edge-sensitive 
logic. 

• P%IRQJL<3:0>: General-purpose interrupt signals, sampled by level-sensitive logic. 

• P%PWRFL_L: Special-purpose power failure interrupt signal, sampled by edge-sensitive 
logic. 

• P%SJE£RRJL: Special-purpose soft error interrupt signal, sampled by edge-sensitive logic. 

10.6.7 Signal Dictionary 

Table 10-9: Cross-reference of all names appearing in the Interrupt chapter 

Schematic Name Behavioral Model Name 

C%CBOXJBLKKR_H C%CBOX_H_ERR_H 
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Table 10-9 (Cont.): 


Cross-reference of all names appearing in the Interrupt chapter 


Schematic Name 


Behavioral Model Name 


C%CBOX_S_KRJEt_H 


OG^/TROY <2 PRC XT 


E%XNT_BEQJS 


TPtJ^TMT PPD XT 

XiiTtXiN i, XVXllV^ XX 


E_BUS%ABUS_L<31jO> 


XT' PTTQGJ.APTTQ XT 
il/_x3 U JVC AH U o_Jtl 


E_BUS%UTKST_L<0> 


1? PTTQGSXTTXPCST tl 
£j_Si U OTto U 1 HiO 1 _X1 


E_BUS%WBUS_L<3 1 iO> 


Hi TO WJ3U 0_X1 


E_PMN%PMON JL. 


IT PMWe.PMO'W XT 
xi_x IVli\ vox 1VxWJ.\ xl 


E_PSL%PSL_H«dJ0tl6> 


X? PQT <£,PCT XT 
x^_x oxj/Cx OJU_xx 


E_STI/S>F_NOP_S5_H 


XT' CTT G£,P TtffYP XT 


E_STL%LA!E_F_NOP_S4JB 


TP CPT RT ATT? P XTrkP O/ XT 


E_S TL^cLAIE_S TALL_S4_H 
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10.7 Revision History 



Table 10-10: Revision History 



Who 



When 



Description of change 



Mike Uhler 06-Mar-1989 Release for external review. 

Mike Uhler 14-Dec-1989 Update for second-pass release. 

Ron Preston 09-Jan-1990 Changes to simplify implementation. 

Mike Uhler 20-Jul-1990 Update for change to performance monitoring interrupt request and 

reflect implementation. 

Ron Preston 07-Feb-1991 Update to reflect Pass 1 implementation. 
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Chapter 11 
The Fbox 



11.1 Overview 

This chapter describes the floating point unit of the NVAX CPU chip. Only the major functional 
blocks, their interfaces to each other, and the interface to the rest of the NVAX system are 
described here. Circuit level implementation details are not of primary concern in this document. 

11.2 Introduction 

The Fbox is the floating point unit in the NVAX CPU chip. The Fbox is a 4 stage pipelined 
floating point processor, with an additional stage devoted to assisting division. It interacts with 
three different segments of the main CPU pipeline, these are the micro-sequencer in S2 and the 
Ebox in S3 and S4. The Fbox runs semi-autonomously to the rest of the CPU chip and supports 
the following operations: 

• VAX Floating Point Instructions and Data types 

The Fbox provides instruction and data support for VAX floating point instructions. VAX F-, 
D-, and G-floating point data types are supported. 

• VAX Integer Instructions 

The Fbox implements longword integer multiply instructions. 

• Pipelined Operation 

Except for all the divide instructions, DIV{F,D,G}, the Fbox can start a new single precision 
floating point instruction every cycle and a double precision floating point or an integer mul- 
tiply instruction every two cycles. The Ebox can supply two 32-bit operands or one 64-bit 
operand to the Fbox every cycle on two 32 bit input operand buses. The Fbox drives the 
result operand to the Ebox on a 32-bit result bus. 

• Conditional "Mini-Round" Operation 

Result latency is conditionally reduced by one cycle for the most frequently used instructions. 
Stage 3 can perform a "mini-round" operation on the LSB's of the fraction for all ADD, SUB, 
and MUL floating instructions. If the "mini-round" operation does not fail, then stage 3 drives 
the result directly to the output, bypassing stage 4 and saving a cycle of latency. 

• Fault and Exception Handling 

The Ebox coordinates the fault and exception handling with the Fbox. Any fault or exception 
condition received from the Ebox is retired in the proper order. If the Fbox receives or 
generates any fault or exception condition, it does not change the flow of instructions in 
progress within the Fbox pipe. 
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Figure 11—1 is a top level block diagram of the Fbox showing the six major functional blocks 
within the Fbox and their interconnections. 



Figure 11-1 : Fbox block diagram 
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11.3 Fbox Functional Overview 

The Fbox is the floating point accelerator for the NVAX CPU. Its instruction repertoire includes 
all VAX base group floating point instructions. The data types that are supported are F, D, and 
G. Additional integer instructions that are supported are MULL2, and MULL3. 

The number of internal execution cycles and the total number of cycles to complete an instruction 
within the Fbox is measured as follows in Figure 11-2 
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Figure 11-2: Fbox Execute Cycle Diagram 
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The internal execution time for all instructions except MUL{D,G,L} and DIV{F,D,G} is four cycles. 
The internal execution time of the various Fbox operations is given in the following Table 11—1. 



Table 11-1: Fbox Internal Execute Cycles 



INSTRUCTION 


F 


D 


G 


L 


MUL 


4 


5 


5 


5 


DIV 


14 


25 


24 




ALL OTHER 


4 


4 


4 


4 



The total number of cycles taken by the Fbox to complete an instruction is given in Table 11—2. 
Note that this includes the cycles taken for opcode and operand transfer, in particular, the dead 
cycle between the opcode and the first operand is counted. 



Table 11-2: List of the Fbox Total Execute Cycles 



INSTRUCTION 


F 


D 


G 


L 


MUL 


7 


10 


10 


8 


DIV 


17 


30 


29 




ALL OTHER 


7 


9 


9 





1 1 .3.1 Fbox Interface 

This section is responsible for overseeing the protocol with the Ebox. This includes the sequence 
of receiving the opcode, operands, exceptions, and other control information, and also outputing 
the result with its accompanying status. The opcode and operands are transferred from the input 
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interface to stage 1 in all operations except division. The result is conditionally received from 
either stage 3 or stage 4. 

11.3.2 Divider 

The divider receives its inputs from the interface and drives its outputs to stage 1. It is used 
only to assist the divide operation, for which it computes the quotient and the remainder in a 
redundant format. 

11.3.3 Stage 1 

Stage 1 receives its inputs from either the interface or the divider section and drives its outputs 
to stage 2. It is primarily used for determining the difference between the exponents of the two 
operands, subtracting the fraction fields, performing the recoding of the multiplier and forming 
three times the multiplicand, and selecting the inputs to the first two rows of the multiplier array. 

11.3.4 Stage 2 

Stage 2 receives its inputs from stage 1 and drives its outputs to stage 3. Its primary uses are: 
right shifting (alignment), multiplying the fraction fields of the operands, and zero and leading 
one detection of the intermediate fraction results. 

11.3.5 Stage 3 

Stage 3 receives most of its inputs from stage 2 and drives its outputs to stage 4 or, conditionally, 
to the output. Its primary uses are: left shifting (normalization), and adding the fraction fields 
for the aligned operands or the redundant multiply array outputs. This stage can also perform a 
"mini-round" operation on the LSB's of the fraction for ADD, SUB, and MUL floating instructions. 
If the "mini-round" does not overflow, and if there are no possible exceptions, then stage 3 drives 
the result directly to the output, bypassing stage 4 and saving a cycle of latency. 

11.3.6 Stage 4 

Stage 4 receives its inputs from stage 3 and drives its outputs to the interface section. It is used 
for performing the terminal operations of the instruction such as rounding, exception detection 
(overflow, underflow, etc.), and determining the condition codes. 

11.4 Fbox - Ebox Interface 

The Fbox depends on the Ebox for the delivery of instruction opcodes and source operands and 
for the storing of results. However, the Fbox does not require any assistance from the the Ebox 
in executing the Fbox instructions. The Fbox macroinstructions are decoded by the Ibox just 
like any other macroinstruction and the Ebox is dispatched to an execution flow which transfers 
the source operands, fetched during S3 of the CPU pipeline, to the Fbox early in S4. Once all 
the operands are delivered, the Fbox executes the macroinstruction. Upon completion, the Fbox 
requests to transfer the results back to the Ebox. When the current retire queue entry in the 
Ebox indicates an Fbox result and the Fbox has requested a result transfer, then the result is 
transferred to the Ebox, late in S4 of the CPU pipeline, and the macroinstruction is retired in S5. 
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The Fbox input interface has two input operand registers which can hold all of the data for one 
instruction, and a three segment opcode pipeline. If the Fbox input machine is unable to handle 
new opcodes or operands then F%INPUT_STALL_H is asserted to the Ebox, causing the next Fbox 
data input operation to stall the CPU pipeline at the end of its S3. 

The Fbox output interface has a format mux and two result queues, the data queue and the 
control queue. The format mux is used to transform the result data into VAX storage format. 
The queues are used to hold data results and control information whenever result transfers to 
the Ebox become stalled. 

11.4.1 Opcode Transfers to the Fbox 

Whenever the Fbox indicates that it is ready to receive new information by negating 
P%INPUT_STALL_H, the Ebox may initiate the next opcode or operand transfer. The Fbox receives 
instructions from the Microsequencer (S2 of the CPU pipeline) on a 9 bit opcode bus. The opcode 
bus is made up of the 8 msb's of the macroinstruction along with a single bit which, when 
set, indicates a G data type operation (i.e., the low order macroinstruction opcode byte was FD 
(hex)). The Micro-sequencer indicates the presence of a new opcode by asserting the opcode valid 
flag. E f cFBOX_lST_CYCLEJB. This opcode valid flag is only asserted once for each new instruction. 
In particular, if the Microsequencer was stalled during an opcode transfer cycle then the same 
opcode could be driven for multiple cycles, however, ESeFBOX_lST_CYCLE_H is only asserted for one 
of those stalled cycles. A complete list of the instructions executed by the Fbox and the opcode 
received from the Micro-sequencer is contained in Table 11—3. 



NOTE 

The Fbox does not check for an illegal opcode. However, if an illegal opcode is received 
then the Fbox will interpret it as if it were an ADDF. No indication is given that this 
error has occur ed, the Fbox simply assumes that an ADDF has been started. When the 
instruction is retired (assuming that it actually was not an ADDF) it will be possible for 
diagnostic software to determine that an error has occured. This processing of illegal 
opcodes is done entirely to keep the Fbox internal control signals in a predictable state 
and thus avoid any "catastrophic" failure. 

Once a valid opcode has been received from the Microsequencer, it is processed in a three element 
pipeline/queue by the Fbox input logic. The first level, II, is a static register which feeds the 
re-code PLA. The second level, 12, is the recoded opcode. The third level, 13, is the current 
instruction, this register is output to both the Divider and Fbox stage 1. Any operand being sent 
to the Fbox is always for the instruction that is in 13. Each level has a corresponding valid bit 
which indicates the presence of an instruction to be executed. When the Fbox input is not stalled 
then opcodes and operands flow in the following order: 

a. Opcode from the Microsequencer is loaded into II during #4. (CPU S2) 

b. Re-code PLA runs during the following #12. 

c. Re-coded opcode is loaded into 12 at the end of $2* 

d. 12 is loaded into 13 during the following #3. 

e. Input operand latches are loaded during the next #12 » a * the earliest 

f. Fbox internal Data Valid is set on #3 following the last operand reception. 
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If the final data is not received during phase 12, then the 13 register stalls. This back pressures 
the Fbox input instruction pipeline, if there is a valid instruction in 12 then it will also stall. 
Once 12 is stalled, II will stall on the next instruction from the micro-sequencer. When the final 
operand for the instruction in 13 is received the stall is removed and new instructions are allowed 
to advance within the input pipeline. 

Besides stalling when waiting for operands from the Ebox, the input instruction pipeline stalls 
for a fixed number of cycles during MUL{D,G,L} and DIV{F,D,G} instructions. These internally 
generated stalls, termed opcode stalls, are needed to allow multiple passes in the multiply and 
the divide arrays. The opcode stalls not only keep the Fbox input pipeline from advancing, but 
also cause F%INPUT_STALL_H to be asserted back to the Ebox. 

Because an opcode stall can not be started until all the operands for the stalling opcode have 
been received, a three level instruction pipeline/queue is needed in the Fbox input stage (refer 
to Section 11.4.3, Figure 11—3). It is possible for the Fbox to receive two additional new opcodes 
before the opcode stall can be asserted and take effect at the Ebox. These two additional opcodes, 
along with the original stalling opcode, must be held in the Fbox input stage until the stall is 
finished. 

11.4.2 Operand Transfers to the Fbox 

Source operands, which were accesed in the Ebox during S3, are transferred from the Ebox to 
the Fbox early in S4. There will always be at least one cycle between the opcode transfer and the 
corresponding operands, during which the Fbox decodes the opcode. The data type of the source 
operand, contained in the 13 register of the input instruction pipeline, is used to select the proper 
data input format. There are two 32-bit input data busses, E9cABUS_H and E%BBUSJH, which are 
used to transfer operands to the Fbox. If the instruction is either a single operand type or, an 
integer or floating F type, then all of the operands are transferred in one cycle. If the instruction 
is a floating D or G type then one complete 64 bit operand is transferred on the concatenated 
input busses at a rate of one per cycle. For a floating D or G data type, the lower longword (i.e., 
sign, exponent, and fraction MSB's) is transferred on the E%ABUS_H and the upper longword is 
transferred on the E%BBUS_Ho. 

Each 32-bit input operand bus has a related short literal flag which indicates the presence of 
a short literal on bits<5:0> of the corresponding bus. If a double precision operand is being 
transferred then a short literal will be detected using the flag associated with the E%ABUS_H 
and the floating short literal data will be taken from E%ABUS_H<5Kh> . The remaining E%ABUS_H 
and E%BBUS_H bits are zero, however the Fbox ignores them. "When receiving an integer short 
literal, the integer is on bits<5:0> and the Fbox depends on the remaining bits of that bus being 
zero. The Fbox must transform all short literals to the proper format based on the instruction 
data type. 

When all of the input operand information for both input data busses is valid, the Ebox asserts 
an input valid flag, E%FDAJA_VALID_H. If the flag is not asserted then the Fbox input machine 
enters an input stalled state. 

Along with the operands, the Ebox sends 3 different operand fault flags. These are the memory 
management, hardware error, and reserved address mode faults. Once an operand fault has been 
sent to the Fbox, it is unpredictable whether the Ebox will or will not assert the E%FDA3A t _VATXn_H 
signal. It is also unpredictable whether or not any other outstanding operands will be sent. When 
the Fbox receives an input fault two actions take place: 
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1. The Fbox asserts data valid into the Fbox pipeline. This breaks any internal stall conditions, 
thus allowing the instruction to complete. 

2. The Fbox asserts P%ENPUT_STAjLL_H. This halts the transfer of any other operands and 
prevents the Fbox and Ebox from getting out of synchronization. This stall normally continues 
until after the faulting instruction has been retired by the Fbox. It is cleared by the assertion 
of E%FLUSH_FBOXLH or K%RESET_H. 

Since the faulting operand data values used by the Fbox are undetermined, it is possible that 
the Fbox may generate additional faults. However, the Ebox prioritizes the faults on retirement, 
the three input operand faults are at the highest priority. Therefore, any Fbox generated fault 
is ignored if the Fbox received an input operand fault. On completion, the faulting instruction 
will be handled by the Ebox in the proper order, ensuring compliance with the "VAX architecture 
standard. In addition, the Ebox will flush the Fbox, this will cause P%INPUT_STALL_H to be 
negated, releasing the stalled state. 

Besides the operand fault flags, the Ebox also sends the current value of the PSL floating 
underflow enable bit, E%PSL_FU_H. If the FU bit is set then the Fbox will cause a fault on floating 
underflow. Whether the FU bit is set or clear, the Fbox will return a floating zero data value on 
the result bus if underflow is detected. 

11.4.3 Summary of Fbox input Stage Stall Rules 

The following list is a set of input stall rules for the Fbox input stage. They center around opcode 
transfers and the actions related to the assertion and negation of P%INPUT_STALL_H. 

1. Floating opcodes are transferred from the Microsequencer to the Fbox during the CPUs S2 
cycle. There will always be at least one cycle between an opcode transfer, OPC1, and the 
first data transfer for that opcode. In addition, there can only be one new opcode transfer, 
OPC2, between OPC1 and OPCl's last data transfer. It is possible that a new opcode transfer, 
OPC3, could take place in the same cycle as OPCl's last data transfer. Refer to the following 
Figure 11-3. 

Figure 11-3: Opcode Transfers to the Fbox 
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2. Assertion of F%ENPUT_STALL_H implies that the next data transfer cycle will stall; i.e., 
if F%INPUT_STALL_H is asserted during a data transfer cycle then that cycle will not 
stall but the next data transfer cycle will. That next data transfer cycle can not have 
either E%FBOX w lST_CYCLE_H or E%FDAIA W VAIID_H asserted. The Ebox will repeat the 
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stalled transfer cycle keeping the E%ABUS_H, E%BBUS_H, E%FDATA W VAT ,TD_H, and any faults 
unchanged. 

3. If P%ENPUT_STALL_H is released in the current data transfer cycle then the current data 
transfer cycle will be repeated once more in the next cycle, this time with E%FDAIA k .VAIID_H 
asserted. In that next cycle it is also possible to have E%FBOX_iST_CYCLE_H asserted, 
indicating a new opcode transfer. 

11.4.4 Fbox Result Transfers to the Ebox 

Data is returned to the Ebox on one 32-bit output bus. A single integer or floating F type result 
can be returned in one cycle. Floating D/G data requires two cycles, the lower 32-bits (i.e., sign, 
exponent, and mantissa msb's) are returned in the first cycle followed by the upper 32-bits in the 
next cycle. A two bit data length field and a two bit condition code map field are also returned 
with each result transfer, as are all of the result status bits. The data length field is used to 
indicate a result data length of Byte, Word, Longword, or Quadword. The condition code map 
field informs the Ebox which PSL condition code bits must be updated for the retiring instruction. 
If the Fbox is not trying to retire an instruction then the condition code map is forced to a value 
of "no update". For double precision results which require two transfers, the data length is set to 
Quadword during both transfers. The condition code map will be forced to a value of "no update" 
during the first transfer of a double precision result and then to the proper instruction dependent 
code during the second transfer. The other result status is broadcast during both transfers. The 
Ebox uses the result status to detect microtrap conditions before any store of result data occurs. 

The Fbox supplies 12 bits of status information with the retirement of each instruction. These 
are made up of: 

a. Operand faults received with the input operands. 

1. F%MMGT_FLT_H - memory management faults 

2. F%MERR_H - hardware read faults, etc 

3. F%RSVD_ADDR_MODE_H - Reserved Address Mode Fault 

b. Fault conditions detected by the Fbox 

1. F%RSV_H - reserved operand 

2. F%FOV_H - floating overflow 

3. F%FU_H - floating underflow 

4. F%FDBZ_H - floating divide by zero 

c. Fbox condition code values 

1. F%CC_N_H - result is negative 

2. F%CC_ZJB - result is zero 

3. F%CC_V_H - result caused an integer overflow 

4. P%CC_MAP_H<ik» - cc update map select 

If multiple exceptions are detected by the Fbox for an instruction that it is executing then all of 
the exceptions for that instruction are reported to the Ebox. The Ebox and Microsequencer 
will prioritize these faults. The source operand faults are at the highest priority. Refer to 
Section 8.5.19.7 for the priority of the Fbox detected faults. 
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There are two signals from the Ebox to the Fbox that control the transfer of results by the Fbox. 
E%RETIRE_OK_H informs the Fbox that it may be possible to retire an instruction. E%STORE_OK_H 
indicates that it is possible for the Fbox to store data. When the Fbox wants to store a result the 
request signal, F%STORE_H is asserted. Similarly, if the Fbox wants to retire an instruction then 
F%RETEREJH is asserted. All instructions must be retired on completion, most instructions (with 
the exception of TST and CMP) also need to store data. Single precision and integer instructions 
which store a result request to both store and retire in a single transfer cycle. Double precision 
instructions which store a result need two transfer cycles, the first transfer requests only to store, 
the second transfer requests to both store and retire. All TST and CMP instructions, regardless 
of data type, will request to retire without a store in one transfer cycle. 

The completion of a result transfer from the Fbox to the Ebox is recognized when the appropriate 
request and its corresponding OK signal are both asserted. Conversely, if the corresponding OK 
signal is not asserted then the Fbox stalls (repeats) the current transfer. 

When an instruction is completed by the Fbox core, the Fbox output stage transforms the data 
result back into VAX memory format. The VAX formatted data, along with ten bits of result 
status, is then always written into the output data queue. This queue has seven entries, each of 
which are 74 bits wide. The data from this queue is transferred to the Ebox on the F%RESULT_H 
bus in a first-in/first-out fashion, one longword at a time. If the data queue is empty at the 
time that the core is retiring, then the low word of the formatted data, along with the result 
status, is also selected to bypass directly to the result bus. This action is performed by the result 
multiplexer, which can select one of three sources: the queue bypass bus, the output queue low 
word, or the output queue high word. 

The data queue is written every cycle, its input (write) pointer is only advanced after writing 
valid data. Whenever an instruction is retired, the data queue output (read) pointer is advanced. 
When the input and and output pointers are selecting the same entry then the queue is empty. 
If the input pointer is only one entry ahead of the output pointer a condition called empty next 
is detected. The empty and empty next conditions are used to generate result transfer requests 
from the data queue, and also in selecting between the queue bypass bus or the queue read data. 
Because double precision results retire from the Fbox core in one cycle but require two cycles to 
be transferred back to the Ebox, the high word of a double precision result will always be sourced 
from the data queue. This allows the core to retire quadword results in consecutive cycles (which 
could happen when CVTx{D,G} instructions are executing). 

Besides the data queue, the Fbox output also has a control queue. This queue is seven bits wide 
by seven entries deep. It contains information derived from the opcode; the result data length, 
the condition code map, whether the instruction writes a result or not, and how many transfer 
cycles will be required to retire the instruction. Since the opcodes will precede the data through 
the Fbox pipeline by one cycle, there is no need to have a bypass bus for the control queue. The 
output machine is always able to write the control information into this queue and read it back 
before it is needed. Like the data queue, the control queue is written every cycle. Its input 
pointer is advanced after a new instruction has been passed through the pipeline and written 
into this queue. Its output pointer is advanced after a valid entry has been read into the control 
latch (i.e., the control queue's output latch). Because the request information is needed early in 
the transfer cycles, the control queue often is running ahead of the data queue. 

Result transfers to the Ebox can be initiated by one of three sources: from the Fbox stage 3 bypass 
request line, from a data valid in Fbox stage 4, or from the Fbox output queue. The output queue 
takes precedence over the Fbox core. If the queue is not empty then the current queue output is 
transferred to the Ebox, any concurrent results from the Fbox core are written into the output 
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queue. Fbox stages 3 and 4 perform their own prioritization. If stage 4 is retiring an instruction 
then stage 3 will not attempt to bypass stage 4. Instead, stage 3 passes its unrounded result to 
stage 4 and stage 4 will retire that result in the next cycle. 

11.4.5 Fbox Pipeline Stalls 

The Fbox input can request to stall the Ebox for one of two reasons. The Ebox does not actually 
stall until the next time it is ready to transfer data to the Fbox. 

Fbox Input Stall 

1. Opcode Stalls 

2. Fault Stalls 

As was mentioned earlier at the end of Section 11.4.1, the implementation of some instructions 
requires more than one cycle of execution within some stages of the Fbox pipeline. These 
instructions require that they be followed by a sufficient number of bubbles in the pipeline such 
that they can not be overrun by succeeding instructions. In particular, MUL{D,G,L} require two 
cycles in the stage 2 multiply array, and DIV{F,D,G} require 10.21,20 cycles, respectively, in the 
divide array. In order to guarantee proper operation, the Fbox input generates an input stall 
of the appropriate length for each of these instructions. The multiply stalls are controlled by a 
simple state machine in the Fbox input, it starts when all of the multiply operands have been 
received and continues for one cycle. The divide stalls are started by the input interface, as soon 
as all of the divide operands are received, and ended by a divide done signal which is received 
from the Fbox divider stage. 

Whenever the Fbox receives an operand from the Ebox for which the Ebox has signaled a fault, 
the Fbox will request an input stall. This is done because it is unpredictable whether or not 
the Ebox will complete any other outstanding data transfers for this instruction. Therefore, to 
prevent the Fbox from entering an unpredictable state, P%INPUT_STALL_H is asserted and any 
new data transfers after the faulting source operand are blocked. When the instruction with the 
faulting operand is retired the Ebox will flush the Fbox, this will release the fault stall condition. 

The Fbox output can cause a stall at the Ebox for one of two reasons: 

Fbox Output Stall 

1. Result not ready 

2. Stage 4 bypass abort 

If the Fbox does not have any results ready to retire and it is the selected source for the RMUX 
in the Ebox, then the Ebox is stalled until the Fbox is ready to transfer the result. 

Stage 3 in the Fbox has the ability to perform "mini -round" operations for floating ADD, SUB, 
and MUL instructions. When stage 3 detects that it may be possible to round its fraction result 
and bypass stage 4, then it makes a request to store data to the Fbox output interface. If the 
data queue is empty then this store request is passed on to the Ebox. Later in the same transfer 
cycle, stage 3 may detect a "mini-round" overflow or some other error condition. If this occurs 
then stage 3 signals an abort of the stage 4 bypass. If the data queue was empty then this abort 
causes F%STORE_STALL_H to be asserted to the Ebox. The current store is stalled, by the Fbox, 
for one cycle until the correct result can be obtained from stage 4. 
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11.4.6 Fbox Reset and Flush 

The Fbox can be initialized by the assertion of two different signals. At powerup time K%RESET_H 
is asserted for several cycles. This signal initializes all of the instruction registers and the output 
queue pointers in the Fbox interface. Any outstanding transfers and all stalls are terminated. At 
the completion of reset the Fbox is properly initialized and ready to receive opcodes and operands. 

The Ebox can also initialize the Fbox by asserting the E^FLUSH.FBOX^H signal. This has the same 
effect as resetting the Fbox, the Fbox pipeline is cleared of all operations. Operations already 
under way anywhere in the pipeline are lost. E%FLUSH_FBOX^H is updated during phase 1 and 
it is only asserted for one cycle. The Fbox is ready to receive new opcodes in the very next cycle. 

11.4.7 Summary of Fbox-Ebox Signals 

The following signals are driven by the Ebox to the Fbox. 

• E%FLUSH_FBOX_H 

This signal causes the Fbox to clear its pipeline of all operations. 

• E%FBOX_lST_CYCLE.H 

This signal tells the Fbox that the opcode is valid. 

• ET C FOPCODE„H<8:0> 

This 9-bit opcode bus carries the 8-bit opcode byte of the macro-instruction along with a single 
bit that indicates G-type data. 

• E^FDATA_VALIDja 

This signal tells the Fbox that all data on the operand busses is valid. The Fbox knows, from 
decoding the opcode, exactly what data to expect. 

• E%ABUS_H<31sO> and E < &BBUS_H<31sO> 

These 32-bit busses carry the source operand(s). 

• E%A W SHLIT_H and E%B_SHUT_H 

These signals indicate that the data on the E%ABUSJB or the E%BBUS_H, respectively, is a 
6-bit short literal value extracted from the instruction stream. Special data formatting is 
required by the Fbox. 

• E%PSL_FU_H 

The current PSL<FU> value for use by the Fbox in deciding whether to signal floating point 
underflow faults or not. 

• E%F JMMGT_FLT_H, E%F_MEM_EBR_H, and E%F_KSVD_ADDR_MODE_H 

These signals tell the Fbox that there is a fault or error associated with the source operands. 
The Fbox carries this status down the pipeline so that it is handled after instructions which 
are already in the Fbox pipeline. 

• E%FBOX_S4_BYPASS_ENB_H 

This signal is used to control the Fbox stage 4 bypass option. Assertion of this signal enables 
stage 3 to conditionally bypass stage 4. This signal is normally cleared at system startup, 
disabling the bypass option. This signal has the additional function of selecting between 
FD1R or FD2R to be output of Stage3 while the FBOX is in FBOXJfest mode. 

• E%KEHRE_OBLH, E%STORE_OK_H 

These signals inform the Fbox of any stalls when attempting to transfer a result to the Ebox. 
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The following signals are driven by the Fbox to the Ebox. 

• F%ENPUT_STALL_H 

This control signal stalls the Ebox from issuing any more operands to the Fbox. 

• F%RETTRE_H 

This control signal tells the Ebox the Fbox is attempting to retire an instruction in this cycle. 

• F%STOEE_H 

This control signal tells the Ebox the Fbox is attempting to store a result in this cycle. 

• F%STORE_STALL_H 

This control signal tells the Ebox the Fbox is stalling the current store request this cycle. 

• F%RESULT_H<3 1 :00> 

This 32-bit bus carries Fbox results to the Ebox. 

• F%FBOXJDL_H<1:0> 

This is the data length used by the Ebox for an Fbox store. 

• F%CC_N_H, FVcCC.Z.H, F^CC.V.H 

These 3 signals carry Fbox condition code bits to the Ebox. They are Negative, Zero, and 
Overflow. 

• F%CC_MAP_H<lKh> 

This is the map specifier which tells the Ebox how to update the PSL condition code bits. 

• FTcMMGT_FLT_H 

Signals a memory management fault for one of the currently retiring instruction's source 
operands. 

• F9ZMERR_H 

Signals a memory access hardware error for one of the currently retiring instruction's source 
operands. 

• F%RSVD_ADDR_MODE_H 

Signals a reserved address mode fault for one of the currently retiring instruction's source 
operands. 

• F%ESV_H 

Signals a reserved operand fault for one of the currently retiring instruction's source operands. 

• F%FOV_H 

Signals a floating point overflow fault resulted from the currently retiring instruction. 

• F%FU_H 

Signals a floating point underflow fault resulted from the currently retiring instruction. 

• F%FDBZ_H 

Signals a floating point divide-by-zero fault resulted from the currently retiring instruction. 

1 1 .4.8 Fbox Instruction Set 

Hie instructions listed in Table 11-3 constitute the VAX integer and floating point instructions 
supported by the Fbox datapath. 
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Table 11-3: Fbox Floating Point and Integer Instructions 


Fbox 






cc 






Lfpc 


Instruction 




MA±* 


TYT 

LfLt 


Exceptions 


04C 


CVTBF src-rb, dsLwf 


**00 


10 


10 




06C 


CVTBD srcrb, dst.wd 


**00 


10 


11 




14C 


CVTBG srcrb, dst.wg 


**00 


10 


11 




04D 


CVTWF srcrw, dstwf 


**00 


10 


10 




06D 


CVTWD srcrw, dst.wd 


**00 


10 


11 




14D 


CVTWG srcrw, dst.wg 


**00 


10 


11 




04E 


CVTLF srcrl, dstwf 


**00 


10 


10 




06E 


CVTLD srcrl, dst.wd 


**00 


10 


11 




14E 


CVTLG srcrl, dst.wg 


**oo 


10 


11 




048 


CVTFB srcrf, dstwb 


***0 


11 


00 


rsv, iov 


049 


CVTFW srcrf, dst.ww 


>c>c><Q 


11 


01 


rsv, iov 


04A 


CVTFL srcrf, dst.wl 


**>«Q 


11 


10 


rsv. iov 


068 


CVTDB srcrd, dstwb 


***Q 


11 


00 


rsv. iov 


069 


CVTDW srcrd, dst.ww 


JjciicJitQ 


11 


01 


rsv, iov 


06A 


CVTDL srcrd, dstwl 




11 


10 


rsv, iov 


148 


CVTGB srcrg, dst.wb 


***Q 


11 


00 


rsv, iov 


149 


CVTGW srcrg, dst.ww 


***Q 


11 


01 


rsv, iov 


14A 


CVTGL srcrg, dst.wl 


***Q 


11 


10 


rsv, iov 


04B 


CVTRFL srcrf, dstwl 


***Q 


11 


10 


rsv, iov 


06B 


CVTRDL srcrd, dstwl 




11 


10 


rsv, iov 


14B 


CVTRGL srcrg, dst.wl 


***o 


11 


10 


rsv, iov 


056 


CVTFD srcrf, dst.wd 


**00 


10 


11 


rsv 


199 


CVTFG srcrf, dstwg 


**00 


10 


11 


rsv 


076 


CVTDF srcrd, dst-wf 


**00 


10 


10 


rsv, fov 


133 


CVTGF srcrg, dst.wf 


**oo 


10 


10 


rsv, fov, fuv 


040 


ADDF2 add.rf, sum.mf 


**oo 


10 


10 


rsv, fov, fuv 


041 


ADDF3 addl.rf, add2.rf, sum.wf 


**oo 


10 


10 


rsv, fov, fuv 


060 


ADDD2 add.rd, sum.md 


**oo 


10 


11 


rsv, fov, fuv 


061 


ADDD3 addl.rd, add2.rd, suziLwd 


**oo 


10 


11 


rsv, fov, fuv 


140 


ADDG2 add.rg, sum.mg 


**oo 


10 


11 


rsv, fov, fuv 


141 


ADDG3 addl.rg, add2.rg, sum.wg 


**00 


10 


11 


rsv, fov, fuv 
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Table 11-3 (Cont.): Fbox Floating Point and Integer Instructions 



Fbox 






cc 






Opc 


Instruction 


NZVC 


MAP 


DL 


Exceptions 




042 


SUBF2 Bub.rf, dif.mf 


**00 


10 


10 


rsv, fov, fuv 


043 


SUBF3 sub.rf, min.rf, dif.wf 


**00 


10 


10 


rsv, fov, fuv 


062 


SUBD2 sub.rd, dif.md 


**00 


10 


11 


rsv, fov, fuv 


063 


SUBD3 sub.rd, min.nl, dif.wd 


**00 


10 


11 


rsv, fov, fuv 


142 


SUBG2 sub.rg, dif.mg 


**00 


10 


11 


rsv, fov, fuv 


143 


SUBG3 sub.rg, min.rg, dif.wg 


**00 


10 


11 


rsv, fov, fuv 


0C4 


MULL2 mulr.rl. prod. ml 




11 


10 


iov 


0C5 


MULL3 mulr.rl, muld.rl, prod.wl 




11 


10 


iov 


044 


MULF2 mulr.rf, procLmf 


**oo 


10 


10 


rsv, fov, fuv 


045 


MULF3 mulr.rf. mulcLrf, prod.wf 


**oo 


10 


10 


rsv. fov, fuv 


064 


MULD2 mulr.rd, procLmd 


**oo 


10 


11 


rsv, fov, fuv 


065 


MULD3 mulr.rd, muld-rd, prod-wd 


**00 


10 


11 


rsv, fov, fuv 


144 


MULG2 mulr.rg, prod.mg 


**00 


10 


11 


rsv, fov, fuv 


145 


MULG3 mulr.rg, muld.rg, prod.wg 


**00 


10 


11 


rsv, fov, fuv 


046 


DIVF2 divr.rf, quo.mf 


**00 


10 


10 


rsv, fov, fuv, fdvz 


047 


DIVF3 divr.rf, divd.rf, quo.vrf 


**00 


10 


10 


rsv, fov, fuv, fdvz 


066 


DIVD2 divr.rd, quo.md 


**00 


10 


11 


rsv, fov, fuv, fdvz 


067 


DIVD3 divr.rd, divd.rd, quo.wd 


**00 


10 


11 


rsv, fov, fuv, fdvz 


146 


DTVG2 divr.rg, quo.mg 


00 


10 


11 


rsv, fov, fuv, fdvz 


147 


DIVG3 divr.rg, divd.rg, quo.wg 


**00 


10 


11 


rsv, fov, fuv, fdvz 


050 


MOVF srcrf, dst.wf 


**0- 


01 


10 


rsv 


070 


MOVD srcrd, dstwd 


**0- 


01 


11 


rsv 


150 


MOVG src.rg, dst.wg 


**0- 


01 


11 


rsv 


052 


MNEGF srcrf, dstwf 


**00 


10 


10 


rsv 


072 


MNEGD srcrd, dst-wd 


**00 


10 


11 


rsv 


152 


MNEGG srcrg, dst.wg 


**00 


10 


11 


rsv 


051 


CMPF srcLrf, src2.rf 


**00 


10 


XX 


rsv 
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Table 11-3 (Cont): Fbox Floating Point and Integer Instructions 



Fbox 
Opc 


Instruction 


NZVC 


cc 

MAP 


DL 


Exceptions 


071 


CMPD srcl.rd, src2.rd 


**oo 


10 


XX 


rsv 


151 


CMPG Brcl.rg, src2.rg 


**oo 


10 


XX 


rsv 


053 


TSTF srcrf 


**oo 


10 


XX 


rsv 


073 


TSTD srcrd 


**oo 


10 


XX 


rsv 


153 


TSTG src.rg 


**oo 


10 


XX 


rsv 



CC.MAP: Condition Code Map 



00 = No Update 

01 = MOV Floating 

10 = All Other Floating 

11 = Integer 

DL: Result Data Length 

00 = Bvte 

01 = Word 

10 = Long 

11 = Quad 



11.5 DIVIDER 
11.5.1 Introduction 

The divider stage in the Fbox performs the floating point divide operations. The inputs to the 
divider stage are the divisor and the dividend operands, source data type, opcode, data valid, 
and abort from the input interface section. The divider computes the quotient, and outputs to 
stage 1 of the pipeline: the quotient as two vectors, the final remainder, also as two vectors, and 
division done signals. The divider also supplies the division done signal to the input interface 
section. The input interface stalls after issuing a divide instruction and defers further issue of 
instructions to Divider/Stagel until the division is completed in the divider. 

The final quotient and the final remainder are computed in the pipe stages. The sign of the final 
remainder is used for correcting the quotient. This correction is done in stage-3 of the pipeline. 
The terminal operations for floating point divide (quotient overflow, rounding), and the detection 
of floating overflow, underflow, and reserved operand are done in the pipeline stages. 

The execution time within the divider stage is data independent for divide instructions. The table 
below lists execution time within the Fbox for divide instructions. 
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Table 11-4: Total Fbox execute cycles for Divide operation 



Instruction 


Execution time in cycles 


DIVF 


17 


DIVD 


30 


DIVG 


29 



The execution cycles are counted beginning with the cycle in which Fbox receives the divide 
opcode through the cycle in which Fbox retires the result to EBOX. 

A typical cycle count for DIVD instruction would have 1 opcode transfer cycle, 1 dead cycle, 2 
operand transfer cycles, 1 divide pla cycle, 20 divider array cycles (retires 60 bits of quotient), 
1 cycle each through stagel, stage2 and stage3 and finally 2 cycles for the result transfer from 
stage4 Gower longword) and output interface (upper longword) for a total count of 30 Fbox cycles. 



11.5.2 Overview 

The divider uses the Radix-2 SRT division algorithm using the following recursive relation: 



Ls tne n«v :mU. rexainae: 



D is the diriser. (assumed to be normalized. J 



The partial remainder is computed using carry save addition and the quotient is selected using 
an estimate of the partial remainder. The boundary conditions for the partial remainder and the 
estimated partial remainder are as follows: 



a. -2D »< partial remainder < 2D 

b. 0 -< Max. error < 1.0 

c. -2.5 -< estimated partial remainder < 2.0 

d. Quotient selection 

q » -1 if estimated partial remainder < (- 0.5) 

q - 0 if (- 0.5) -< estimated partial remainder < 0 

q - +1 if estimated partial remainder >- 0 

lb compute the estimated partial remainder the condition b) together with (c) above implies that 
a Carry Propagate Adder (CPA) of 4 bits (3 bits above the binary point and 1 bit below the binary 
point) is required. 

The division process essentially consists of the following two steps to retire each bit: 

• Compute estimated partial remainder using the CPA and the quotient 

* Compute the new partial remainder using the CSA by adding +D, -D or 0 to the partial 
remainder based on the quotient from step 1. 
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Figure 11-4: Divider Array Block Diagram 



NPR FROM PREVIOUS ROW 



V 



4 BITS 
ONLY 



A 


BIT CPA 


QUOTIENT 




LOGIC 



+D 



CSA1 
PR + D 



C( + D) 



>K Y 



± V V 



CSA2 
PR - D 



S(+D) 



C(-D) 



v v * 



CSA3 
PR + 0 



S(-D) 



C( + 0) 



S( + 0) 



2 * (3:1) SELECTOR AND SHIFTER 



OS(l). QA(I) 



C(NPR) 



S(NPR) 



TO NEXT ROW OF CSA'S 



In order to speed up the time for retiring each bit, step 1 and step 2 are performed in parallel as 
there are only three choices for the quotient. As shown in the block diagram, Figure 11—4, the 
divider array computes (PR+1*D), (PR-1*D) and (PR+0*D) for all the possible values of quotient: 
q = -1, +1, and 0, in parallel while the quotient is being calculated. The correct new partial 
remainder is selected using the computed quotient. In the divide array, there are three rows of 
CSAs. Thus three bits are retired with each pass through the divide array. 



11.6 interface Signal Timing Diagrams 



11.7 Divider Operation 

For a valid divide operation, the divisor is loaded into Divisor (DVR) register and the dividend 
into Dividend Feedback (DFB) register, both during PHI_4. The CFB is initialized to zero. The 
control then sequences the datapath with appropriate control signals to load DFB and QM, for 
the required number of divide steps. For the DIVF instruction, the divide array generates 27 
bits of quotient. For the DIVG instruction, the divide array produces 57 bits of quotient. For the 
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Operand Transfer To Divider: 

| P3 | P4 | PI + P2 | P3 | P4 | PI + P2 | P3 | P4 | 
10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14. 



I I OPERAND I I I I 

f_b%fd2_l<a2:b58> xxxxyxxxxxyxxxttoo tt x xyyxxttooooftxy 

F_B*FD1_L<A2:B58> | j j j j j j 1 1 | | 

F_B%ED1_L 
F_B%ED2_L 
F 3%S1 L 
F~3%S2 L 



III! 

F_I*DATA_VALID__L / ' \ \ 



v I%DSEQ START L / \ 

i I i i i I I . i i i 

10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14.0 

| P3 | P4 | PI + P2 | P3 | P4 j PI + P2 j P3 i P4 j 

Key: I - driven by Interface 



DIVD instruction, the divide array produces 60 bits of quotient. In general, since the quotient is 
greater than Or equal to 0.5 and less than 2.0, the number of quotient bits to generate are the 
number of bits in the data type, one bit above the binary point and for rounding an additional 
bit in the least significant end. Since the divider array has three rows, one to two additional bits 
are generated. 

The divider control receives the F_I%DSEQ_j3TART_L signal from the input interface indicating 
a valid DIV instruction. This signal should remain valid from the trailing edge of PHI_2 (input 
to the Divider PLA) thru to the trailing edge of PEQ_4 (Divisor and Dividend Latches) coming 
from the input interface. The divisor and dividend operand latches are conditioned by the 
F_I%DSEQ_START_L signal. The source data type field from the input interface determines 
whether the division is a DIVF or DIVD or DIVG. 

At the conclusion of the required divider steps signals FJD_C2%DSEQ_DONEDAT4_H (to 
Input Interface) and F_D_C%DIVDONE_DAT_H (to Stage-1) are asserted. First the quotient 
components are driven on F_I%FD1R_H and F_I%FD2R_H together with the exponent and sign 
registers on respective buses. Then the sum and carry vectors are driven on F_I%FDR1_H and 
F_I%FD2R_H with exponents and signs. 
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Divider Result Transfer: 



DIVDONE DAT 



I,D 



NOTE 1 



F_I%FD1_H<A2 :B58> 

F_I%ED1_H 

F I%S1 K ' 

F_I %FD2_H<A2 : 35 8> 

F_I%ED2_H 

F I%ED1 H 



I,D NOTE 2 



| PI I P2 I P3 | P4 | PI I P2 | P3 | P4 | PI | P2 | 
0 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 



xxxxxxxxx: 



xxxxx>oooc 



I R EMAINDER SUM 



D0000000< 



I REMAINDER CARRY 



xxxxxxxx 



0 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 

; PI I ?2 ! P3 j ?4 | PI !'?2 j P3 , ?4 | PI j P2 j 



D - driven by Divider. 

NOTE 1: divdone__dat with t_bypass_d deasserted. 
NOTE 2: data valid only for quotient transfer. 



The final quotient and the final remainder are computed in the pipeline stages. In stage 1, the 
two parts of the quotient and in the following cycle, the two parts of the remainder are added. The 
final quotient requires correction if the sign of the final remainder is negative as one too many 
subtractions were performed. Thus, if the sign of the final remainder is negative the quotient is 
decremented in stage 3. If the quotient is GEQ 1.0, it is shifted down and rounding constant is 
added in stage 4. 

11.8 Divider Implementation 

The divider stage consists of fraction data path, control, exponent and sign sections. 

11.8.1 Divider Fraction Data Path 

The divider fraction data path is composed of divisor register, divider array, quotient logic, 
quotient/remainder selector, and the fraction data path drivers. A block diagram of the divider 
fraction data path is shown in Figure 11—7. The divider fraction data path is shifted down by 
three bits relative to the interface and stage 1 fraction data path as shown in the figure. 
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11 .8.1 .1 Divisor Register - DVR 

The divisor register DVR<B1:B55> stores the divisor from the interface for divide operations. 
The DVR register is loaded during FHI_4 when input interface asserts the F_I%DSEQ_START_L 
(asserted in PHI_2 and held by the divider through PHI_4) and the divider control asserts 
DVR_WR_FD1 (asserted in PHI_4). DVR<A2A0> are forced to zero and DVR<B0> is forced 
to one. The output of this register is shifted down by two bits (for topological reasons to create 
space for Estimated Partial Remainder logic at the left of the datapath) and is used by the divider 
array to compute the partial remainder in DCSA cells. The dividend operand is also latched in 
PHI_4. 

11.8.1.2 Divider Array 

The divider array consists of three rows of carry save adders (CSA's), three carry propagate adders 
(CPAs), latches for the dividend and intermediate results. The various cells the divider array is 
composed of are DCSA, DSEL CPA, LAT1, R2D, DCSAF, DFB and CPA. The least significant bits 
of the array are different from the others and are described later. 

11.8.1.2.1 DCSA and DSEL 

The DCSA, the carry save adder cell computes in parallel the (partial remainder + divisor), 
(partial remainder - divisor) and (partial remainder + 0) corresponding to the quotient values of 
-1, 1, and 0 as sum (S) and carry (C). The correct new partial remainder is selected in DSEL 
using the three select lines from the CPA 

??.: Partial Remainder 
S: sum input 
C: carry input 
D: divisor 

S_FLUS0: sum output of PR+0 C_PLUS0: carry output of PR+0 

S_PLUSD: sum output of PR+1*D C_PLUSD: carry output of PR+2*D 

S~MXNUSD: sum output of PR-1*D C_KINUSD: carry output of PR-1*D 

SUM - S XOR C 

SANDC_I - NOT(S AND C) 

SORC_L - NOT(S OR C) 

S_PLUS0 - SUM 

S_PLUSD - NOT ( (D AND SUM) OR {NOT D AND NOT SUM) ) 
S~MINUSD - NOT(S_PLtJSD) 

CJPLUSO - NOT (SANDC_L) 

CMPLUSD - NOT({D AND SORC_L) OR (NOT D AND SANDC_L) ) 
C~MINUSD - NOT ( (NOT D AND SORC_L) OR (D AND SANDC_L) ) 

The inputs to the first row of the divider array are DVR, SFB_H, CFB_H. During the first step of 
the divide, the SFB and CFB contain the dividend and zero respectively and during subsequent 
steps they carry the outputs of the third row. The second row of the divider also uses the DCSA 
and DSEL cells. 

In the least significant bits of the array, since the S vector is shifted left by 1 and C vector is 
shifted left by 2, except for the first step of the division, the S and C inputs to the DCSA are 
zero. For the first step of the division, the least significant bit contains dividend <B55>. For the 
computation of PR-1*D, the divisor is complemented and a one is forced in the C input position 
( to complete the 2's complement) as illustrated in Table 11—5. 
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Table 11-5: CSA Inputs 



CSA Ports 


PR+0 


PR+1*D 


PR-1*D 


Input S 


S 


s 


S 


Input C 


0 


0 


1 


Input D 


0 


D 


NOT D 


Output S 


s 


SXORD 


SXORD 


Output C 


0 


S AND D 


S OR (NOT D) 



11.8.1.2.2 LAT1 

The outputs of the first row are latched every cycle in the LAT1 cell to avoid corrupting the third 
row inputs. The LAT1 cell is also used to latch the select lines from the row 1 CPA in bit position 
<B56> for the formation of the quotient. The LAT1 outputs are shifted left - S by one and C 
by two, to form the 2*partial remainder for the second row DCSA. During reset, LAT1 is loaded 
with the row 1 outputs to prevent illegal data making multiple select lines valid in the second 
and third rows of the divider array. 

11 .8.1 .2.3 R2D and DCSAF 

The cell R2D buffers the outputs of the second row and consequently the S and C vectors for the 
third row are asserted low. The cell DCSAF used for the third row is similar to the DCSA cell 
except that it takes S and C in complement form. 

11.8.1.2.4 DFBand SHF 

The DFB register contains static latches for the S and C outputs from the third row of the divider 
array and to store the dividend. The dividend is loaded into SFB from the input bus during PHI_4 
using the control signal DFB_WR_FD2 and RESET_H, while the CFB is cleared. The S and C 
vectors from the third row are loaded into DFB using the control signal DFB_WR_R3 at the end 
of each pass through the array. The outputs of the DFB cell SFB_H and CFB_H are fed back to 
the first row of the array for the next pass. In addition, at the end of the required division steps, 
the DFB holds the final remainder to be transmitted to stage 1. The sign of the final remainder 
is used to correct the final quotient. Since the sign is derived from <A0> bit of the stage 1 adder, 
the final remainder is shifted down and buffered. The SHF cell accomplishes this and its outputs 
are RSR_L and RCR_L. 
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Figure 11-7: Divider Fraction Data Path 
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11.8.1.2.5 CPA 

The CPA in each row of the divider array computes the estimated partial remainder(EPR) and 
generates the three select lines for selecting one of PR+0, PR+D and PR-D in the array. The 
inputs to the CPA are the four MSBs of S and C from the divider array. The CPA is implemented 



11-22 The Fbox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



as a carry select adder as shown in Figure 11-8. The carry select adder computes the sign of the 
EPR, SIGNJ3, and the zero detect logic detects if the 4-bit sum is exactly -0.5 (1111#2), Z_H. The 
three select lines are derived as follows: 



ESTIMATED PARTIAL REMAINDER 



ACTION 



011.1 

010.0,010.1,011.0 
001. X 

ooo.x 
111.1 

111.0 
110.X 
101.1 

101. 0,100.X 

SE1_2E_R*_K - select FR+0 output 
SEL_?D_R*_K • select PR+D output 
SE1_KD_R*_K » select PR-D output 



SELECT PR+0 OUTPUT 
SELECT PR-D OUTPUT 
SELECT PR-D OUTPUT 
SELECT PR-D OUTPUT 
SELECT PR+0 OUTPUT 
SELECT PR+D OUTPUT 
SELECT PR+D OUTPUT 
SELECT PR+D OUTPUT 
SELECT PR+D OUTPUT 



(NOT POSSIBLE) 
(NOT POSSIBLE) 



(NOT POSSIBLE) 



:::r i_h aihj zzsi:_t. 

NOT SIGN H 



The three select lines are also used to form the quotient. 



Figure 11-8: CPA Block Diagram 
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11 .8.1 .3 Quotient Recoding and Quotient Registers 
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11 .8.1 .3.1 QS21 and QREC 

The select lines SEL_PD_R*, and SEL_MD_R* indicate the selected quotient value. Each pass 
through the array three pairs of quotient hits are generated. These can be expressed as the 
number of additions and the number of subtractions performed. These bits need to be accumulated 
in two shift registers. The final quotient is the total number of effective subtractions performed. 

In order to minimize the number of bits to accumulate and to reduce the shift register bits, 
the three pairs of quotient bits from each pass through the divider array are encoded into four 
bits. The encoding is accomplished by generating the magnitude of the number of subtractions 
in each pass as three bits and a carry bit if the number of additions is greater than the number 
of subtractions. These four bits, instead of the six bits before the encoding, are accumulated in 
the shift register QM/QS. The carry vector after shifting left by 1 is subtracted from the number 
of effective subtractions to form the final quotient. 

Since the row 3 computation is done last, two sets of quotient bits are generated from the first 
two rows - one for each possibility and the final quotient bits are selected based on the row 3 
quotient bits. The cell QS21 performs recoding and generates the QSB21, QSB20, QSBlO(QSBll) 
and QCA1 and QCAO. 
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subtrac- 


done 




row 














row 










dene 




row 








subrracr 


done 


in 


row- 


3 








drr.e 




row 





X2 - £2 XOR A2 

CSS2I - NOT (X2 XOR £1) 0SB20 - X2 XOR Al 
£5511 ■ £1 XOR Al QSB10 ■ NOT Q£S11 

0£30 - SO OR AO 

0CA1 - A2 OR (NOT £2 AND NOT El) 
QCAO - A2 OR (NOT £2 AND Al) 



The cell QREC selects the final quotient bits and its outputs are QSUB_H<2:0> and QC_L<0> 
corresponding to the effective subtractions and the carry from one pass thru the array. These 
bits are shifted in to accumulate the final quotient in QM/QS cells. 

1 1 .8.1 .3.2 QM and QS registers 

The QM and QS is a master/slave shift register that holds the two components of the quotient - 
the number of subtractions performed and the carry vector respectively. After each pass through 
the array the quotient bits are loaded into QM at various positions depending on the data type. 
For the DIVF instruction the quotient bits are shifted into bit <B25>. For the DIVD instruction 
the quotient bits are shifted in at position <B58>. For the DIVG instruction the quotient bits are 
shifted in at position <B55>. The quotient carry component QC, is shifted left by one position 
when it is loaded into QM. The QM register is initialized to zero before beginning a new divide 
instruction so that the pipeline stages can operate on all the bits of the quotient. The QM register 
gets loaded either from the QSUB<2:0> and QC<0> or from the slave QS after a shift of three 
bits in PHI_4. The QS latch is loaded every PHH_2. The QM cells uses six control signals to clear, 
load or shift in the data. These control signals are derived as shown in Table 11-6. 
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Table 11-6: QM Cell Control Signals 








Operation 








Bit Positions 


INIT 


DIVF 


DIVD 


DIVG 


DONE 


Cells 


A0:B22,B26:B52 


CLEAE 


SHFL 


SHFL 


SHFL 


NOP 


QMC,QMFC,QMGC 


B23:B25 


CLEAR 


FLO AD 


FSHF 


FSHF 


NOP 


QMF 


B53-.B55 


CLEAR 


NOP 


SHFD 


GLOAD 


NOP 


QMG 


B56:B58 


CLEAR 


NOP 


DLOAD 


NOP 


NOP 


QMD 


Control Signals * 


DQM.SHFL 


0 


1 


1 


1 


0 




DQM.FLD 


0 


1 


0 


0 


0 




DQM.FSHF 


0 


0 


1 


1 


0 




DQMJDLD 


0 


0 


1 


0 


0 




DQM_GLD 


0 


0 


0 


1 


0 




DQM.CLR 


1 


0 


0 


0 


0 





* -asserted HIGH. 

During reset, all the above control signals except DQM_CLR are deasserted. 



In order to simplify the stage 1 control, the ones complement of the QC component is transferred 
to stage 1 so that stage 1 adder performs the same operation for both the final quotient and the 
final remainder computation. 

11.8.1.3.3 QSEL and TSF 

The QSEL selects the divider results to be driven to the stage 1 fraction data path. At the 
end of the required division steps, first the two components of the quotient are selected and 
in the following cycle the RSR and RCR of the final remainder are selected using the control 
signals DIV_SEL_REM_* . Since the carry component of the quotient is only one bit per three 
quotient bits, zeros are forced into the other two bits. The TSF cell consists of a tristate driver 
that drives the divider results on F_B%FD1_L and F_B%FD2_L busses during PHI_2 and PHI_3 
using the control signal F_D_C2%DIVDONE_DATF_H. The TSF also contains buffers to drive 
F_I%FD1R_H and F_I%FD1R_H to stage 1. 

1 1 .8.2 Divider Control 

The divider control is responsible for all sequencing and control of the divider data path. It 
gets F_I%DSEQ_START_L, SRC_DT_H<1:0> , F_I %D ATA_VALIDR_H and F_I%ABORT_H from 
the Input Interface. The divider control generates all control signals for the data path, and 
F_D_C2%DSEQ_DONEDAT4J3 signal for the input interface and F_D_C%DIVDONE_DAT_H 
to stage 1 of the pipeline. The early signal F_D_C2%DSEQJDONEDAT4_H to the input interface 
stays valid thru two cycles for both the quotient and remainder transfers. 
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The F_I%DSEQ_START_L signal obtained from the input interface must be valid by the trailing 
edge of PHI_2. A latched version of this signal is used in PHI_4 to latch in the divisor and 
dividend. 

11.8.2.1 Divider Control Blocks 

The divider control consists of the control sequencer and miscellaneous logic, source data type 
latches, and buffers for driving the various control signals to the fraction data path. 

11.8.2.1.1 Control Sequencer 

The control sequencer is implemented as a PLA The inputs to the PLA are the latched version of 
F_I%ABORT_H, F_I%DSEQ_START_L, FJ%SRC_DT_H<1:0> and state information. The PLA 
essentially implements a counter and a sequencer to control the data path. The divider control 
stays in the NOP state until a valid divide opcode and valid operands are received. The signal 
F_I%DSEQ_STAKTJL obtained from the input interface combines these two conditions. 
The state transition table shows the sequencer state, inputs and outputs. 

Figure 11-9: Divider Sequencer State Transition Table 
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The divider control PLA has 10 inputs, 14 outputs and 26 Minterms. These numbers include one 
spare input, one spare output and three spare minterms. 
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11.8.2.1.2 Opcode Information Latches 

The divider latches the source data type signal from the input interface. If the divider is not busy, 
then the source data type information is latched into a static PHI__2 latch (cell LS). The output 
of this latch is used as an input to the sequencer PLA. 

1 1 .8.2.1 .3 Divider Behavior during ABORT 

Divider starts execution upon receipt of the F_I%DSEQ_START_L signal from the Input Interface. 
Assertion of F_I%ABORT_H from the Input Interface, while the divider is retiring quotient bits, 
will automatically force the divider to reset its control sequencer to its initial NOP state and to 
maintain the data valid enable in its deasserted state. It is expected that the Input Interface 
also deasserts the F_I%DATA_VALIDR_H signal during the ABORT cycle. 

Assertion of the F_I%ABORT_H signal from the Input Interface during quotient and result 
transfers to Stage-1, also STOPS the divider from driving the F_D %D ATAJVALIDR_H 
line to Stage-1. As above it is expected that the Input Interface also deasserts the 
F_I%DATA_\ALIDR_H signal during the ABORT cycle. 

11.8.2.1.4 Data path Control Drivers 

The various control signals to the data path are combined with the appropriate clock signals and 
driven to the data path. 

11.8.2.2 Summary of Divider Stage Outputs 

The following table shows the divider stage outputs for the divide operations: 

Table 11-7: Divider Output Stages 

Instruction Divider Outputs 



Q(A,S) R 



DIVF 


Q(C,S)<A0dB25>=:Q 


Remainder 




Q(C,SkB26:B58>=0 




DIVD 


Q(C,S)<A0:B58>=Q 


Remainder 


DIVG 


Q(C,S)<A0:B55>=Q 


Remainder 




Q(C,SkB56-358>=0 





Q(C,S)— Quotient Vectors QC, QS 

R Remainder vectors, carry and sum 



NOTE: 

* The divider stage saves the exponent and the sign parts of the operands and passes them 
during the result transfer unchanged. 

• Floating divide by zero, reserved operand, floating overflow and underflow are not detected 
by the divider stage. In these cases, the Q(C,S) and R outputs are undefined. 
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• The control outputs generated by the divider 

stage, DIVDONE_DAT_H and DSEQ_DONEDAT4_H signals are deasserted for non-divide 
operations. 

11.8.2.3 Data Valid Logic 

The divider output signal F_D %D ATA_VALIDR_H driven to Stage-1 signal is a logical OR'ing 
of F_I %DATA_VALID_H signal from the Input Interface and the F_D_C_DV%EN_H signal from 
the Divider. These signals are mutually exclusive. The Input Interface deasserts its data valid 
after issuing a divide instruction and awaits the FJD_C2%DSE(__DONEDAT4_H signal from 
the Divider before it asserts the data valid again. The presence of the global ABORT signal 
F_I%ABORT_H disables the driving of F_I %D ATA_VALIDR_H signal by the Divider. 

r_D%DAIA_VAI.IDP._H - NOT (F_I%DArA_VA*,ID_l AID F_E_C_DV%Si _L } 

11.8.3 Exponent and Sign Data Path 

The exponent data path in the divider consists of registers to save the exponents and signs of 
the divisor and the dividend. The divider does not operate on the exponent and sign parts of 
the divisor and the dividend. The exponents and signs are saved to pass them to stage 1 of the 
pipe along with the quotient and final remainder components so that for floating point divide 
operations, the exponent result and exception conditions can be detected. 

The LI cell is a static latch and is loaded with sign and exponent data from the interface during 
PHI_4 if a valid F_I%DSEQ_START_,L is detected. At the end of divide operation the exponent 
and sign data is driven to stage 1 exponent data path. The cell TSE contains the tristate driver 
and the driver. The exponent and sign data, as in the case of the fraction data path, is actively 
driven during PHI_2 and PHI_3 using the control signal F_D_C2%DSEQ_DONEDAT_H and 
PHI_23. 

11.9 Stage 1 

Stage 1 of the pipeline is primarily used to perform the addition of the two inputs, or to compute 
the encoded shift amount, or to perform the recoding for the multiplier array, generate the initial 
partial product, select the row one input to the multiplier and the row two input to the multiplier 
in stage 2. Stage 1 receives its inputs from either the interface section or the divider section. All 
outputs of stage 1 are driven to stage 2 of the pipeline. The sign of the adder result is driven to 
stage 3 as well as stage 2. Stage 3 requires the sign of the remainder, for the divide operation, 
to determine if the quotient result should be incremented. 

The fraction datapath portion of stage 1 primarily consists of an input selector, an adder, the 
multiplier recoder, and two output selectors. The adder in stage 1 is used for many functions. 
For multiply operations it is used to compute three times the multiplicand, for quotient operations 
it is used for adding the sum and carry vectors for the quotient; for other operations it is used to 
add two vectors. 

The recoder logic is used to select the appropriate bits of the multiplier and recode them. The 
recoded bits are inputs to the multiplier array in stage 2. 
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The exponent datapath of stage 1 primarily consists of an input selector, two adders, detection 
logic, and an output selector. The main purpose of the exponent section in stage 1 is to compute 
the exponent difference. The detection logic is used to determine the range of the exponent 
difference. 

The sign datapath portion in stage 1 performs no operation on the sign bits. They are passed 
unchanged to stage 2. 

11.10 Section Implementation Description 



11.10.1 Fraction Datapath 

Figure 11—10 is a top level block diagram of the Fbox stage 1 fraction datapath. 
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Figure 11-10: Fraction Datapath Block Diagram 
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The Table 11-8 lists what is required to be loaded into the stage 1 fraction datapath registers, 
FD1R and FD2R, for each operation. 



Table 11-8: Stage 1 Fraction Register Operations 



Category 


Operation 


Condition 


F0 


FD1R <- 


OPl - (OP2) + 1 


Effective SUB fDeltaE=0} CMP 




FD1R <- 


OPl -SHRl(OP2)+ 1 


ndiecuve ovjd viseiuu^s +J..I 


F2 


FD1R <- 


-SHRl(OPl)+OP2+ 1 


Effective SUB (DeltaE= -1) 


F3 


FD1R <- 


OPl 


Effective ADD or effective SUB (DeltaE > 1), and 
ED1R < ED2R 


F4 


FD1R <- 


OP2 


Effective ADD or effective SUB (DeltaE > 1), and 
ED1R >= ED2R 


F5 


FD1R <- 


OPl + OP2 


DIV ( after the divide array operation, done once for 
the quotient and one for the remainder) * 


F6 


FD1R <- 


OPl + 0 


CVTfi, CVTff, MOV, MNEG, TST, and CVTif (if input 
integer is positive) 


F7 


FD1R <- 


-(OPl) +0+1 


CVTif (if input integer is negative) 


FS 


FD1R <- 


OP2 + SHLKOP2) 


MUL, MULL 


F9 


FD2R <- 


OPl 


Effective ADD or effective SUB (DeltaE > 1), and 
ED1R >= ED2R 


F10 


FD2R <- 


OP2 


Effective ADD or effective SUB (DeltaE > 1), and 
ED1R < ED2R, or MUL, MULL 



* — The divider supplies stage 1 with QA . This allows the stage 1 adder to perform the same operation on the quotient and 
the remainder inputs. 



11.10.2 Integer Overflow - IOVF 

The integer overflow logic in stage 1 is used to help facilitate the detection of an integer overflow 
condition during a CVTFI operation. 

11.10.3 Input Selector - ISEL 

ISEL consists of two 3 to 1 selectors. The inputs to the A selector are FDlR%I<bI>, 
FDlR%I<bI+l>, and FD2R%I<bI-l>. The inputs to the B selector are FD2R%I<bI>, 
FD2R%I<bI+l>, and zero. Both selectors can invert the selected input. 

11.10.4 Adder 

The adder uses two 61-bit inputs to derive a 62-bit result. The 61-bit inputs have two bits above 
the binary point and 59 bits below; the 62-bit result has an additional bit above the binary point. 

The main carry acceleration technique used is carry select. The adder is broken up into nine 
small groups, with all but the least significant group having duplicate carry chains. These carry 
chains operate in parallel in the first half of the stage 1 cycle. Propagate and generate logic 
operates before the carry chains. These parts of the adder are fully static. 
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In second half of the cycle, the sum logic executes. Just as for the carry logic, there is duplicate 
sum logic for all groups except the least significant one. These carry out signals are used to select 
the correct sum values. These parts of the adder are also fully static. The carry in to bit position 
<B58> is set directly by the stage 1 control. 

11.1 0.5 Recoder Selector - RSEL 

RSEL is a 2 to 1 selector which selects either FJ%FDlR_H<aO:b28> or F_I%FDlR_H<b26:b55>. 
When F_ 1_C%MRW_UPPER_H 

is asserted bits <a0:b28> are selected, and when F_1_C%MRW_UPPER_H is deasserted bits 
<b26:b55> are selected. 



11.10.6 SRECODER 

The srecoder uses the radix 8 modified Booth algorithm to compute the recoded sign bits of 
the partial products. The srecoder receives F_I%FDlR_H<aO:b26> as an input and outputs 9 
recoded sign bits, F_1_R^SREC_H<8:0>. If either F_1JE%E1Z_H or F_1_E%E2Z_H is asserted, 
the srecoder will force the outputs to a one. The recoded sign bit is asserted when the partial 
product is positive. 

11.1 0.7 Multiplier Two's Complement Register - MTCR<1 8:0> 

The MTCR register is a 19 bit 2 to 1 selector and register. When F_1_C%MRW_UPPER_H is 
asserted the A inputs to the selector are selected and when F_1_C%MRW_UPPER_H is deasserted 
the B inputs to the selector are selected. Bit zero of the A input is tied to VDD, bits <9:1> are 
driven by the SRECODER, and bits <18:10> are tied to VDD. Bits <9:0> of the B input are driven 
by the RECODER and bits <18:10> are driven by the SRECODER. 

11.10.8 Recoder 

There are 31 inputs to the the recoder: F_1_R%RSEL_H<29:0> and zero. The least significant bit 
of the recoder input is always zero. The recoder performs the recoding using the radix 8 modified 
Booth algorithm. The recoder generates 60 recoded bits. They are F_1_R%MREC_H<59:0>. 
Of the 60 bits, 18 are used in stage 1. F_1_R%MRECJ3<5:0> are used to select the 
MIPP. F_1_R%MRECJH<11:6> are used to select the row one input to the multiplier array. 
F_1_R%MREC_H are used to select the row two input to the multiplier array If either 
F_l JS%E1Z_H or F_1_E%E2Z_H is asserted, the recoder will force the recoder outputs to recode 
zero. 



11.10.9 PHI_4 LATCHES 

The PHI_4 latches are used to latch FJ%FDlR<a2:b58>, F_I%FD2R<a2:b58>, 
F_1_R%MRECJH<59:0>, and F_1_R%SREC_H<8:0>. 
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11.10.10 Recoder Register - MRECR[0:6]<5:0> 

The MRECR register is a 42 bit register. The latch is written every cycle with the upper 42 bits 
of the recoder output (F_1_R%MREC_3R_H<59:18>). The output of the register is driven to stage 
2 as MRECR[0:6]<5:0>. 

11.10.11 Multiplier Initial Partial Product Selector and Register - MIPPR 

The MIPP selector is a 1 of 5 selector. It uses the RECODER output bits F_1%MREC_3R_H<5:0> 
to select the initial partial product. The inputs to the selector are: plus/minus (IX, 2X, 3X, 4X) 
the multiplicand, and zero. The selected input will be latched at the end of stage 1 execute cycle. 

11.10.12 Multiplier Row 1 Selector and Register - MRW1R 

The MRW1R selector is a 1 of 5 selector. It uses the RECODER output bits 
F_1%MREC_3R_H<11:6> to select the row 1 input to the multiplier array. The inputs to the 
selector are: plus/minus (IX, 2X, 3X, 4X) the multiplicand, and zero. The selected input will be 
latched at the end of stage 1 execute cycle. 

11.10.13 Multiplier Row 2 Selector and Register - MRW2R 

The MRW2R selector is a 1 of 5 selector. It uses the RECODER output bits 
F_1%MREC_3R_H<17:12> to select the row 2 input to the multiplier array. The inputs to the 
selector are: plus/minus (IX, 2X, 3X, 4X) the multiplicand, and zero. The selected input will be 
latched at the end of stage 1 execute cycle. 

1 1 .1 0.1 4 Selector and Reg ister - FD1 R 

The FD1R selector is a 4 to 1 selector. The inputs to the selector are FD1_3R, FD2_3R, the output 
of the adder, and zero. The selected input is latched at the end of stage 1 execute cycle. 

1 1 .1 0.1 5 Selector and Register - FD2R 

The FD2R selector is a 3 to 1 selector. The inputs to the selector are FD1J5R, FD2_3R, and zero. 
The selected input is latched at the end of stage 1 execute cycle. 

Figure 11—11 is a block diagram of the recoder logic in stage 1 of the Fbox fraction datapath. 
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11.11 Exponent Datapath 

11.11.1 Stage 1 Exponent Processor Block diagram 

Figure 11—12 is a block diagram of the exponent processor logic in stage 1. 
Figure 11-12: Stage 1 Exponent Processor Block diagram 
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Figure 11-12 (Cont.): Stage 1 Exponent Processor Block diagram 
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11.11.2 Exponent Adders 

The operations performed by the adders in stage 1 are listed in Table 11-9. El refers to 
FJJE%ED1R_H, E2 refers to F_I_E %ED2R_H, and K refers to the constants generated in the 
control section of stage 1. 



Table 11-9: 


Exponent Adder Operations 




Category 


Adder #1 


Adder #2 


Condition 


E0 






CVTif , MULL 


El 


El - E2 


E2 - El 


SUBf, ADDf, CMPf 


E2 


- El + E2 




DIVf 


E3 


-El + K 


-K+El 


CVTfi 


E4 


El-K 




CVTff, MOVf , MNEGf 


E5 


E1 + E2 




MULf 


E6 


El + K 




TSTf 



11.11.3 Constants 

Hie constants are driven from the control section into the exponent datapath. The constants 
needed for stage 1 are listed below. 

0000010000000 = 0 ; TSTf 

0000010000000 = 128 ; CVTff, MOV, MNEG {F,D} 
0010000000000 = 1024 ; CVTff, MOV, MNEG {G} 
0000010111000 = 184 ; CVTfi {F,D} 
0010000111000 = 1080 ; CVTfi {G} 

The Table 11—10 shows the required carry-in to the exponent adders. 
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1 OUIC 1 I—- IUi 


Exponent Adder Carry-in Operations 


Category 


Cin E_AD1 


Cin E_AD2 


EO 


d 


d 


El 


1 


1 


E2 


1 


d 


E3 


1 


1 


E4 


1 


d 


E5 


0 


d 


E6 


0 


d 



d = don't care 



11.11.4 Zero Detection 

The zero detectors check to see if an exponent operand has a value of zero. They are enabled by 
ENAJE1Z and ENA_E2Z. The detection is done in the second half of execute cycle and driven 
into the control as E1Z and E2Z. E1Z detects zero on edlr and E2Z detects zero on ed2r. 

1 1 .1 1 .5 Exponent Adder 1 

The exponent adder is a 13-bit static adder used to add or subtract two inputs. Each input is 
passed through a 2 to 1 selector and inversion logic prior to the adder. 

INP.IA can be selected from ED1R or K If ISEL1JED1R_A is asserted, then ED1R is passed 
through the selector. If ISEL1_K_A is asserted, then K is passed through the selector. Inversion 
of the adder input is then done based on the assertion of INVERTJEA__AD1. 

INP_1B can be selected from ED2R or K. If ISEL1_ED2R_B is asserted, then ED2R is passed 
through the selector. If ISEL1_K_B is asserted, then K is passed through the selector. Inversion 
of the adder input is then done based on the assertion of JNVERTJEB_AD1. 

The adder also contains a carry-in to the LSB cell, CIN_E_AD 1_H. The carry-in is primarily used 
for performing subtraction operations. 

Since the adder is static, it begins its operation when the input data is valid at the start of the 
stage 1 execute cycle. Intermediate results in the exponent adder are latched in the second half 
of the execute cycle and sent to the detection logic and output selector. 

1 1 .1 1 .6 Exponent Adder 2 

Exponent adder 2 is almost identical to exponent adder 1. The only real difference is found in 
the input selection logic. 

INP_2A can be selected from ED2R or K. If ISEL2_EE)2R_A is asserted, then ED2R is passed 
through the selector. If ISEL2_K_A is asserted, then K is passed through the selector. Inversion 
of the adder input is then done based on the assertion of INVERT_EA_AD2. 
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INP_2B can be selected from ED1R or K. If ISEL2JED1RJB is asserted, then ED1R is passed 
through the selector. If ISEL2_KJB is asserted, then K is passed through the selector. Inversion 
of the adder input is then done based on the assertion of INVERTJEB_AD2. 

The adder also contains a carry-in to the LSB cell, CIN_E_AD2_H. The carry-in is primarily used 
for performing subtraction operations. 

Since the adder is static, it begins its operation when the input data is valid at the start of the 
stage 1 execute cycle. Intermediate results in the exponent adder are latched in the second half 
of the execute cycle and sent to the detection logic and output selector. 

1 1 .1 1 .7 Exponent Difference Detection 

The exponent difference detection is used to detect certain exponent values. The detection is done 
on the output of both exponent adders, adder 1 and adder 2, and then selection of the exponent 
difference is based on E_N. E_N is bit 12 of adder 1. It is used to select the detection results 
from the positive adder output. The detection logic detects the following conditions: 



Exponent Difference = 0 EJDIFF_EQL_0 
Exponent Difference > 1 E_DIFF_GTR_1 
Exponent Difference = 24 E_DIFF_EQL_24 
Exponent Difference « 25 E_DIFF_EQL_25 
Exponent Difference > 57 E_DIFF_GTR_57 
E2 > El E_N 

Exponent Difference <5:1> .NEQ. 0 E_DIFF_5_1_NEQ_0 
The detection and latching is done at the start of the execute cycle. 

In addition, the absolute value of the exponent difference is determined at the start of the stage 
1 execute cycle. These lines, E_DIFFR<5:0>, are used to drive the inputs to the shift decoders in 
stage 2. 

The exponent block also generates a signal called EDIFF_5_1_NEQ_0. This signal is asserted 
when bits <5:1> of the positive exponent difference are not equal to zero. 

11 .11 .8 Output Selector 

The output data (ED1R) can be selected from four sources: edlr, ed2r, e_adl or it can be set 
to zero. The selection is done based on the assertion of the output select control signals. If 
OSEL1.ED1R is asserted, then edlr is selected. If OSELl_ED2R is asserted, then ed2r is 
selected. If 0SEL1JE_AD1 is asserted, then e_adl is selected. If OSELl_ZERO is asserted, 
then the output of the selector is zeroed. The output of the selector is latched every cycle at the 
end of the stage 1 execute cycle and driven into the following stage. 
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11.12 Sign Datapath 

The sign hits of both the operands are not modified within stage 1. They are used by the stage 
1 control. The two sign bits SI and S2 are latched in stage 1 and are passed to stage 2 of the 
pipeline. 

Figure 11-13: Sign Datapath Block Diagram 
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11.13 Stage 1 Control 

The control section in stage 1 receives the opcode from the interface. The control section 
unconditionally decodes it every cycle. After a minimum of a one cycle delay, stage 1 will receive 
operands from the input interface. If it is a one operand instruction the input interface will 
assert data valid, and stage 1 will perform the instruction. If it is a two operand instruction, 
both operands are driven in the same cycle alongwith data valid. 

11.13.1 Divide Instruction 

During a divide operation, the opcode, data valid, and two operands are passed to the divider and 
stage 1 by the interface. The divider and stage 1 will perform their portion of the divide operation. 
Stage 1 will deassert data valid. When the divider completes the divide operation, stage 1 will 
again receive the opcode. The following cycle stage 1 will receive data valid, divdone_dat, and 
quotient bits QS and QA. Stage 1 will compute the quotient and pass data valid and the quotient 
result to stage 2. The next cycle stage 1 will receive divdone_dat and the sum and carry vectors 
for the remainder. Stage 1 will compute the remainder and pass the sign of the remainder to 
stage 3. 
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11.14 Fraction Datapath Operation Summary 
Figure 11-14: Fraction Datapath Operation Tabie 
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11.15 Fraction Datapath Exception Summary 
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Figure 11-15: Fraction Datapath Exception Summary 
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11.16 Exponent Datapath Operation Summary 
Figure 11-16: Exponent Datapath Operation Tabie 
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11.17 Exponent Datapath Exception Summary 
Figure 11-17: Exponent Datapath Exception Table 
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NOTE: 

• The exponents and signs are driven to stage 1 during both quotient and final remainder 
transfers to stage 1. 

11.17.1 Passthru Signals 

MMGT_FLT_L, MEM_ERR_L, RSV_ADR_L and PSL_FU_H signals are simply passed through 
stage- 1 without change. They are latched coming in from Input Interface during PBH_4 and 
driven to Stage-2 during PHI_2. 
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NEW_FOP_H signal also passes through to Stage-2 unaffected. It is latched during PHI_1 coming 
from Input Interface and driven to Stage-2 during PHI_3. This signal is gated with the global 
purge signal F_I%PURGE_H from the input interface which clears it on a PURGE from the 
input interface. This signal is used by the Output Interace to manipulate its control-queue and 
data-queue pointers. 

11.18 STAGE 2 

11.1 8.1 Introduction 

Stage 2 of the Fbox pipeline is composed of a fraction datapath, an exponent datapath, a sign 
datapath and a control block. Stage 2 receives all its data inputs from stage 1 and passes all its 
data outputs to stage 3. Stage 2 receives control inputs from stage 1 and the interface section, 
and passes control information to stage 3. The stage 2 fraction datapath has an array multiplier, 
a right shifter and detection logic. The detection logic is used to detect the bit position of the 
most significant bit in a number and if a number is equal to zero. The detection logic is also used 
to generate the sticky bit associated with the right shifter. The exponent datapath is composed 
of the standard exponent block, of which only the adder and the output selector are used, and an 
additional 6 bit data register. The sign bits are passed from stage 1 to stage 3 unchanged. 

The stage 2 fraction datapath performs operations on its input data for the following instructions: 
ADDf, SUBf, CMPf. TSTf, MULf, MULL, CVTif, CVTfi and CVTRfi. The ADDf and SUBf 
instructions use the output of the right shifter and the detection logic. CMPf, TSTf, CVTif use 
the output of the detection logic. The MULf and MULL instructions use the output of the array 
multiplier. The CVTfi and CVTRfi instructions use the output of the right shifter. For all other 
instructions the stage 2 fraction output registers are either written with the unchanged input 
data passed from stage 1 or the contents are undefined. 

The stage 2 exponent datapath performs operations on its input data for the MULf and DIVf 
instructions. The adder in the exponent datapath is used to either add or subtract the appropriate 
exponent bias from the exponent data passed from stage 1. The output selector selects between 
the adder output, the input data from stage 1, and zero. For all instructions other than MULf 
and DIVf, the output selector passes the data passed from the stage 1 exponent datapath. 

The stage 2 control block generates all conditional datapath control signals and passes control 
information to stage 3. The control block must sequence the fraction multiplier for MULD/G 
and MULL instructions which require two consecutive cycles of execution in the stage 2 fraction 
datapath for generating the two vectors (carry and sum) used for forming the final product in 
stage 3. 

11.1 8.2 MUL Instruction Flows 

Stage 2 is the stage of the Fbox pipeline that executes most of the computation needed for MULf 
and MULL instructions. To clarify the need for the multiply hardware in stage 2, the basic MUL 
flow is described. The multiplication algorithm implemented in the FBOX is the modified Booth 
algorithm which retires 3 multiplier bits at a time. The steps for calculating the product, or 
the fraction portion of the product in the case of floating point operands is as follows. First, the 
multiples of the multiplicand that are required by the Booth algorithm are calculated and the 
multiplier is recoded. Then the summands (a summand must be one of the calculated multiples 
of the multiplicand) which are to be added together to form the product are selected based on 
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the recoded bits of the multiplier. Finally, the summands are added together and all terminal 
operations (rounding, etc.) are performed as required by the particular instruction and datatype. 

The stage 1 fraction datapath basically calculates the required multiples of the multiplicand 
and recodes the multiplier. The stage 1 exponent datapath adds the exponents of the operands 
for floating point instructions. The sign of the operands are passed from stage 1 to stage 2 
unchanged. The stage 2 fraction datapath selects the summands and performs carry-save addition 
on the summands. The stage 2 exponent datapath subtracts the appropriate exponent bias from 
the sum of the exponents calculated in stage 1. The signs of the operands are passed from 
stage 2 to stage 3 unchanged. The stage 3 fraction datapath forms the final product by doing a 
carry-propagate addition of the carry and sum vectors output from stage 2. The stage 3 exponent 
datapath decrements the exponent of the product if the fraction portion of the product needs to 
be normalized. Stage 3 also checks for potential data dependent stage-4 bypassable cases by 
carrying out a miniround on the lower 3 bits and the round bit. If the rounding operation doesn't 
carry past the 4 bits then stage-4 is bypassed. This bypass is aborted should stage-3 detect 
any exception or potential exception conditions. For a more detailed explanation refer to stage-3 
specifiction. 

For all non stage-4 bypassable instructions, Stage 3 passes the signs of the operands unchanged to 
stage 4. The stage 4 fraction datapath performs all terminal operations on the product. For MULf 
instructions stage 4 rounds the fraction of the product and increments the exponent if the fraction 
overflows. Stage 4 also checks for floating overflow and underflow. For MULL instructions stage 
4 checks for integer overflow and forms and aligns the product for outputing to the interface. 
Stage 4 generates the correct sign bit for floating and integer MUL instructions. 

Two consecutive cycles of execution in stage 2 are needed to complete all MUL instructions except 
MULF. This is due to the fact that 1 cycle is required for each pass through the multiply hardware 
in stage 2 and only F floating datatype multipliers can be completely retired in one pass. A more 
detailed description of the operations executed in the fraction datapaths of stages 1 through 4 is 
given below. 

Stage 1 passes the recoded multiplier, +l*multiplicand and +3*multiplicand to stage 2. The 
multiples of the multiplicand required by the Booth algorithm are 0, +/-l*multiplicand, 
+/-2*multiplicand, +/-3*multiphcand and +/-4" e multiplicand. Stage 1 only calculates 
+3*multiplicand because the other multiples are obtained by a simple shift of +l*multiplicand, 
and all negative multiples are generated by two's complementing the positive multiples. In 
order to reduce the number of computations executed in stage 2 for MULD, stage 1 also passes 
the summand selected from the recoded 3 LSB's (assuming D datatype) of the multiplier. The 
initial partial product is zero for all MUL instructions except for MULD. Stage 1 also passes two 
summands which are input to two rows of CS adders in stage 2 called MROW1 and MROW2, 
and a vector which facilitates generating the two's complement of the selected summands. The 
logic in stage 1 which determines the summands for MROW1 and MROW2 examine different 
multiplier bits depending on the operand datatype and whether it needs to output summands for 
the first or the second pass through the stage 2 multiply hardware. The initial partial product is 
latched in the MIPPR, and the two summand inputs to the MROW1 and the MROW2 are latched 
in the MRW1R and the MRW2R, respectively. The vector used in stage 2 for two's complementing 
the selected summands is latched in the MTCR. 

Stage 2 selects all the summands which are needed to form the product, with the exception of 
the summands provided by stage 1. Stage 2 performs carry-save addition on the summands and 
outputs a carry and a sum vector to stage 3 for the formation of the final product. The multiply 
hardware in Stage 2 can be thought of as a 9 row, 3 bit retirement, carry-save multiply array 
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which is capable of feeding its outputs back to its inputs for executing MULD/G and MULL 
instructions. Each multiply array cell is composed of a selector which selects a summand and a 
carry save adder which adds the summand to the partial product. The first two physical rows 
of the array are called MROW1 and MROW2, and are different from the other 7 rows in that 
they have no selector. The selected summands for MROW1 and MROW2 are passed from stage 1 
in the MRW1R and MRW2R. During the first execute cycle of a MUL instruction, MROW1 adds 
the following three inputs from stage 1: the MIPPR output, the MRW1R output, and the MTCR 
output. During the second execute cycle, MROW1 adds the MRW1R output and the fed back 
MARRAY sum and carry outputs. 

Stage 3 does the carry propagate addition on the carry and sum vectors passed from stage 2 
to form the final product and normalizes the product if necessary. Note that a left shift of 1 bit 
position is the maximum normalization possible. Actually two separate carry propagate additions 
are performed in stage 3. A 60 bit carry propagate addition is performed to form the fraction 
portion of floating point products and the high order 58 bits of integer products. A separate 6 
bit carry propagate addition is performed to form the 6 least significant bits of integer products. 
The carry out generated from the 6 bit addition is accounted for in the 60 bit addition so the 6 bit 
sum can be concatenated to the high order 58 bits. Stage 3 passes the results of both additions 
to stage 4. 

Stage 4 performs all the terminal operations (rounding, etc.) on the final product (except when 
stage-3 bypasses stage-4 operations ) before passing the product to the interface section. Stage 
4 handles detection of floating underflow, floating overflow, integer overflow, and the proper 
alignment of the product. 
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11.19 Stage 2 Implementation Description 
11.19.1 Fraction Datapath 

Figure 11-18: Stage 2 Fraction Datapath Block Diagram 
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Figure 11-18 (Cont.): Stage 2 Fraction Datapath Block Diagram 
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Figure 11-18 (Cont.): Stage 2 Fraction Datapath Block Diagram 
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11.19.2 MSEL - Multiplier Selector 



The MSEL is composed of two 62 bit, 2 to 1 selectors. One selector selects the carry input to 
the MROW1, the other selects the sum input. The two possible carry inputs to the MROW1 are 
the F_1%MTCRJL and zero in some bit positions, or the MCR fed back from the bottom of the 
MARRAY. The two possible sum inputs to MROW1 are the F_1%MIPPR_L or the MSR fed back 
from the bottom of the MARRAY. If the signal MSEL_PASS_FB_H is asserted, then the MCR and 
MSR outputs are passed to the MSEL outputs, MCSELO and MSSELO, respectively. Otherwise 
the F_1%MTCR_L and the F_1%MIPPR_L are passed to MCSELO and MSSELO, respectively. 
The MSEL_PASS_FB_H signal is asserted during the second execute cycle of MULD/G and 
MULL. MCSELO<A2:B58> and MSSELO<A2:B58> are driven to the MROW1. 



11.19.3 MROW1 - Multiplier Row 1 

The MROW1 is composed of a row of 59 CS adders and a 3 bit carry propagate adder. The MROW1 
is actually the first physical row of the multiplier array but since the summand selection is 
performed in stage 1, the MROW1 has no summand selector. The CS adders perform a carry-save 
addition on MRW1RJL<A2:B55> (the summand), MCSELO<A2:B55> and MSSELO<A2:B55>. 
The 3 bit carry propagate adder adds MCSELO<B56:B58> and MSSELO<B56:B58> and is needed 
to maintain a correct partial product. The 3 hit carry propagate adder insures that if the bits 
of the C and the S vector which are shifted out of the array cause a carry of bit significance 
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B55, that carry is correctly added to the partial product. The MROW1 generates a carry output, 
MRW1_C, and a sum output, MRW1_S, which are input to the MROW2. 

1 1 .1 9.4 MROW2 - Multiplier Row 2 

The MROW2 is composed of a row of 59 CS adders and a 3 bit carry propagate adder. The 
MROW2 is the second physical row of the multiplier array, and like the MROW1, it has 
no summand selector. The summand selection for the MROW2 is done in stage 1. The 
CS adders perform a carry-save addition on MRW2R_L<A2:B5 5 > (the summand), and sign 
extended MRW1_C<A2:B53> and MRW1_S<A2:B52>. The 3 bit carry propagate adder adds 
MRW1_C<B54:B55>, MRW1_S<B53:B55>, and the carry out of the 3 bit carry propagate adder 
in the MROW1, MRW1_C<B56>. This 3 bit carry propagate adder is needed to maintain a correct 
partial product. The 3 bit carry propagate adder insures that if the bits of the C and the S vector 
which are shifted out of the array cause a carry of bit significance B55, that carry is correctly 
added to the partial product. The MROW2 generates a carry output, MRW2_C, and a sum output, 
MRW2_S, which are input to the first row of the MARRAY. 

11 .1 9.5 M ARRAY - Multiplier Array 

The MARRAY is a 3 bit retirement per row multiplier array which has 7 rows of multiplier 
cells. The MROW1, MROW2, and the MARRAY are used together to generate a carry and a 
sum vector which are added in stage 3 to produce the final product. The inputs to MARRAY 
are F.lfcFDIR, F_1%FD2R, MRECR, MRW2.C, and MRW2.S. The F_1%FD2R and F_1%FD1R 
contains l*multiplicand and 3*multiplicand respectively for MUL instructions. The MRECR 
contains the recoded multiplier bits. The MRW2_C and MRW2.S signals are the carry and sum 
outputs of the MROW2. Each multiplier cell is composed of a selector and a CS adder. The 
selector selects the summand input and the CS adder adds the summand to the partial product. 
The MRECR[0:6]<5:0> control the summand selectors. The selector inputs are F_1%FD2R, 
F_1%FD2R left shifted by 1 bit position, F_1%FD1R, F_1%FD2R left shifted by 2 bit positions, 
or zero. The selector can generate the ones complement of any of the previously mentioned 
inputs for generating negative summands. The ones complement of zero is never generated. 
The MARRAY selector outputs are unconditionally latched in PHI_4. The least significant bit 
positions <B56:B58> in MARRAY, as in the MROW1 and MROW2, are populated by three bit 
carry propagate adder cells which are used to calculate carrys which have the weight of the 
<B55> bit position. The carry and sum outputs from the second row of the MARRAY, MA_C[1] 
and MA_S[1], are latched unconditionally in PHI_4. The least significant 5 carry and 6 sum 
outputs of the MARRAY cells in the fifth and sixth rows of the MARRAY are latched in the 
MILSBCR and MILSBSR. The MILSBCR and the MILSBSR are used in stage 3 to form the 6 
least significant bits of longword products. The carry and sum outputs from the last row of the 
MARRAY, MA_C[6] and MAJ5E6], are latched unconditionally in PHI_2. The latched versions of 
MA_C[6] and MA_S[6] are the MCR and MSR signals and are driven to the MSEL and stage 3. 

11.19.6 MILSBSR<5:0> - Multiplier Integer LSB Sum Register 

The MILSBSR is a 6 bit register. This register holds a 6 bit sum vector which is used to 
form the least significant 6 bits of the 64 bit product of longword operands. MILSBSR<5:3> 
are written with MA_S[5]<B53:B55>, and MILSBSR<2:0> are written with MA_S[4]<B53:B55> 
uncondtionally in PHI_2. The contents of this register are undefined for instructions other than 
MULL. The MILSBSR is driven to stage 3. 
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11.19.7 M!LSBCR<4:0> - Multiplier Integer LSB Carry Register 

The MILSBCR is a 5 bit register. This register holds a 5 bit carry vector which is used to 
form the least significant 6 bits of the 64 bit product of longword operands. MILSBCR<4:3> 
are written with MA_C[5]<B54:B55>, and MILSBCR<2:0> is written with MA_C[4]<B54:B56> 
uncondtionally in PEH_2. The contents of this register are undefined for instructions other than 
MULL. The MILSBCR output is driven to stage 3. 

11.19.8 RSHIFT - Right Shifter 

The RSHIFT shifts F_1%FD1R to the right by 0 to 57 bit positions depending on the control 
signal FORCE_SHFT_0 and the output of the shift decoder, SDECO. The RSHIFT is used for 
pre-aligning operands in ADD and SUB instructions under certain conditions (for details see 
the description of the stage 2 control) and right shifting the fraction of a floating point operand 
in CVTFI/CVTRFI instructions. If the signal FORCE_SHFT_0 is asserted, the RSHIFT will 
pass F_1%FD1R<A0:B58> to its output RSHFTO<A0:B58> unshifted. If FORCE_SHFT_0 is 
deasserted, F_1%FD1R is passed to RSHFTO right shifted by 0 to 57 bit positions, depending on 
the state of SDECO<57:0> which has exactly 1 bit asserted. F_19eFDlR<A0> is always passed 
to the RSHFT<A0> output. The RSHFTO<B0:B57> bits which become vacant due to the right 
shift of F_1%FD1R<B0:B58> are zero filled. The RSHIFT output, RSHFTO, is driven to the 
RSHFTOR. 

11.19.9 RSHFTOR<A0:B58> - Right Shifter Output Register 

The RSHFTOR is a 60 bit register which is written with RSHFTO<A0:B58> unconditionally in 
PHI_4. The RSHFTOR is driven to the FD1SEL. 

11.19.10 SDEC - Shift Decoders 

The SDEC decodes the F_1%E_DIFFR_H<5:0> from the stage 1 exponent datapath to a 58 bit 
output, SDECO<57:0>, which has exactly one bit asserted. The SDECO is the fully decoded 
right or left shift amount which is used to control the RSHIFT or the normalizer in stage 3, for 
ADD, SUB and CVTFI instructions under certain conditions (for details see the description of the 
stage 2 control). The assertion of SDECO<57> corresponds to a shift of zero. The assertion of 
SDECO<56> corresponds to a shift of 1, and so on. The SDEC output is driven to the RSHIFT, 
the SDECOR and the DETL. 

11.19.11 SDECOR<57:0> - Shift Decoder Output Register 

The SDECOR is a 58 bit register which is written with SDECO<57:0> unconditionally in PHI_4. 
The SDECOR is driven to the LSSEL. 
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11.19.12 DETL - Detection Logic 

The DETL detects if F_1%FD1R is equal to zero, generates outputs which are used by the leading 
one detection logic, L1DETL, and calculates the sticky bit for the adder in stage 3. The sticky bit 
is needed for ADD and SUB instructions under certain conditions (for details see the description 
of the stage 2 control). If the control signal DETL_EN_STKY_L is asserted, the DETL takes 
F_1%FD1R and SDECO as its inputs and calculates the sticky bit, STKYR. The sticky bit is 
set if a one in the F_1%FD1R is right shifted out of the B58 bit position by the RSHIFT. If 
F_2%SET_STKYR_H is asserted, STKYR is set independent of the DETL inputs. The STKY 
latch is written unconditionally in PHI_2 and is driven to stage 3. If DETL_EN_STKY_L is 
deasserted, the DETL takes only the F_1%FD1R as its input and it generates outputs which 
are used by the L1DETL. The DETL has two outputs, FZ and DETLO<65:0>. The FZ is the 
zero detection output and is driven to the stage 2 control block. The DETLO<65:0> outputs are 
driven to the DETLOR. FZ is asserted if F_1%FD1R<B0:B57> are all zeros. The FZ output is 
conditionally loaded in PHI_2 in the stage 2 control. 

11.19.13 DETLOR<B0:B57> - Detection Logic Output Register 

The DETLOR is a 58 bit register which is written with DETLO unconditionally in PHI_41. The 
DETLOR is driven to the L1DETL. 

11.19.14 L1DETL - Leading 1 Detection Logic 

The L1DETL is used to determine the bit position of the leading or most significant bit of 
the F_1%FD1R<B0:B57>. If F_1%FD1R<A0> is a 1 then leading 1 detection is performed on 
the ones complement of F_1%FD1R<B0:B57>, otherwise leading 1 detection is performed on 
F_1%FD1R<B0:B57>. The L1DETL output, L1DETLO, is 58 bits wide and has exactly one bit 
asserted. The L1DETLO output determines the shift required to normalize (the normalizer is in 
stage 3) the F_1%FD1R in CVTIF and under certain conditions ADD and SUB instructions (for 
details see the description of the stage 2 control). If L1DETLO<BO> is set, the left shift amount 
is zero. If L1DETL0<B1> is set, the left shift amount is one, and so on. If the signal E1Z_E2Z 
is asserted, L1DETO<BO> is set independent of the DETLOR outputs. If E1Z_E2Z is deasserted, 
the L1DETL outputs depend on the DETLOR outputs. The L1DETLO is driven to the left shift 
selector LSSEL. 

1 1 .1 9.1 5 LSSEL - Left Shift Selector 

The LSSEL is a 58 bit 2 to 1 selector which selects between L1DETLO<BO:B57> and 
SDECOR<57:0>. If the signal LSSELJPASS_SDECOR_H is asserted, then SDECOR<57:0> is 
passed to LSSELO<B0:B57>. Otherwise L1DETLO<BO:B57> is passed to LSSELO<B0:B57>. 
LSSEL_PASS_SDECORja is asserted if a CVTFI instruction is decoded, or if and an effective 
subtraction and exponent difference greater than 1 is detected. LSSELO is driven to the LSHR. 
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1 1 .1 9.1 6 LSENC - Left Shift Encoder 

Hie LSENC does a binary encoding of the LSSELO<B0:B57> and drives the encoded signal, 
LSENCO_DYN<5:0>, to the ED2R. The LSENCOJDYN signal is used in CVTIF and under certain 
conditions ADD and SUB (for details see the description of the stage 2 control). LSENCOJDYN is 
used to form the result exponent in CVTIF. LSENCO_DYN is used to correct the result exponent 
due to normalizing the result in ADD and SUB. LSENCOJDYN<5:0> is driven to the ED2R. 

11.19.17 LSHR<57:0> - Left Shifter Control Register 

The LSHR is a 58 bit register which is written with LSSELO<B0:B57> unconditionally in PHI_2. 
The contents of this register determine the number of bit positions the normalizer in stage 3 will 
shift its input data. Exactly one bit of LSHR<57:0> is asserted. The LSHR output is driven to 
stage 3. 

11.19.18 FD1SEL - Fraction Data 1 Selector 

The FD1SEL is a 60 bit 2 to 1 selector that selects the input to the stage 2 FD1R. 
If FD1_SELJPASS_0_1 is asserted, zero is passed to the FD1SEL output, FD1SELO. If 
FD1_SEL_PASS_(L1 is deasserted, then the RSHFTOR is passed to FD1SELO. The FD1SEL 
output is driven to the FD1R. 

11.19.19 FD1R<A0:B58> - Stage 2 Fraction Data 1 Register 

The FD1R is a 60 bit register which is written with FD1SELO unconditionally in PHI_J2. The 
contents of this register for all instruction flows are given in the description of the stage 2 control. 
The FD1R output is driven to stage 3. 

11.19.20 FD2R<A0:B58> - Stage 2 Fraction Data 2 Register 

The FD2R is a 60 bit register master/slave register. The master register is written with 
F_1%FD2R unconditionally in PHI_4, and the slave register is written with the output of the 
master unconditionally in PHI_2. The contents of this register for all instruction flows are given 
in the description of the stage 2 control. The FD2R output is driven to stage 3. 
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11.1 9.21 Exponent Datapath 

Figure 11-19: Stage 2 Exponent Datapath Block Diagram 
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11.1 9.22 Zero Detection 

This functional block is not used in stage 2. 
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11.1 9.23 Exponent Adder 1 

The exponent adder is a 13-bit static adder used to add or subtract two inputs. Each input is 
passed through a 2 to 1 selector and inversion logic prior to the adder. INP_1A can be selected 
from ED1R or K. If ISEL1_ED1R_A is asserted, then ED1R is passed through the selector. If 
ISEL1_K_A is asserted, then K is passed through the selector. Inversion of the adder input 
is then done based on the assertion of INVERT_EA_AD 1. INP_1B can be selected from ED2R 
or K If ISEL1JED2R_B i s asse rted, then ED2R is passed through the selector. If ISEL1_K_B 
is asserted, then K is passed through the selector. Inversion of the adder input is then done 
based on the assertion of INVERT_EB_AD1. The adder also contains a carry-in to the LSB cell, 
CIN_E_AD 1_H. The carry-in is primarily used for performing subtraction operations. Since the 
adder is static, it begins its operation when the input data is valid near the falling edge of phase 
1. Intermediate results in the exponent adder are latched in phase 3 and sent to the detection 
logic and output selector. 

For stage 2, INP_1A always selects ED1R not inverted. INP_1B always selects K Inversion 
of INP_1B is done based on the assertion of CIN_E_AD1. In other words, in stage 2 
INVERT_EB_AD 1 is shorted to and named CIN_E_AD1. 

11.19.24 Floating Overflow and Underflow Detection 

This functional block is not used in stage 2. 

11.19.25 Output Selector 

The output selector is used to select the output data from three different sources: edlr, eadl or 
zero. This selection is done for the exponent output data (ED1R), the floating overflow (F_OVFR) 
and the floating underflow (F_UNFR). The selection is based on the assertion of two control 
signals, OSELl_ZERO and 0SEL1_E_AD1. OSELl_E_ADl if asserted, selects the output from 
E_AD1; for overflow and underflow, 0SEL1_E_AD1 selects E_AD1_UNF and E_ADl_OVF. If 
0SEL1_E_AD1 is deasserted, then the output is selected from ED1R; for overflow and underflow, 
0SEL1_E_AD1 deasserted selects ED1RJJVF and ED1R_UNF. This selection is done using a 
2 to 1 selector. The selection of zero is done prior to the 2 to 1 selector described above. If 
OSEL1JZERO is asserted, then the inputs from E_AD1 and ED1R entering the 2 to 1 selector 
are both forced to zero. Then, since only one select line is used to control the selector, the zero 
value will be transferred to the output regardless of the assertion of 0SEL1JE_AD1. The output 
of the selector is latched every phase 1 and driven into the following stage. 

11.19.26 ED2R<5:0> - Exponent Data 2 Register 

The ED2R is a 6 bit register which is written with LSENCO_DYN<5:0> unconditionally in PHI_2. 
The ED2R output is driven to the stage 3 exponent datapath. 
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11.19.27 Sign Datapath 



Figure 11-20: Sign Datapath Block Diagram 
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The 2 inputs to the stage 2 sign datapath are F_1%S1R_H and F_1%S2RJH These bits correspond 
to the sign of operand 1 and the sign of operand 2, respectively. The stage 2 datapath does not 
perform any operations on the sign bits. SIR and S2R are 1 bit master-slave registers. The master 
latches are written unconditionally in PHI_4 and the slave latches are written with the master 
latch outputs unconditionally in PHI_2. The register outputs F_1%S1R_H and F_2%S2R_H are 
driven to stage 3. 
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11.19.28 Control 



Figure 11-21: Control Block Diagram 
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The stage 2 control block generates control signals for the stage 2 fraction and exponent datapaths 
based on opcode and control information passed from stage 1. The control block decodes the 
datapath control signals one cycle prior to the cycle in which they are needed in the datapaths. 
The control signals are latched in master slave latches to allow control decoding to overlap 
with datapath execution and to prevent races. The control block also loads control information 
output from stage 1 into master slave registers and passes the information to stage 3. The 
master slave latches which hold the SRC_DTR<2>, FOP_FLOWR<5:0>, DATA_VALIDR, and 
DST_DTR<2:0> signals are written in PHI_1 (master strobe) and PHI_3 (slave strobe). All 
other master slave latches are written unconditionally in PHI_4 and PHI_2. If the interface 
section asserts F_I%ABORTJB the signals F_2%LAT_MUL2_H and F_2%DATA_VALIDR_H are 
deasserted. 
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The internal signal F_2%LAT_MUL2_h is used to facilitate the sequencing of the fraction 
multiplier and stalling of the DATA_VALIDR bit transfer for MULD/G, and MULL instructions. 
F_2%IAT_MUL2_H is asserted during the second decode cycle of MULD, MULG, and MULL 
instructions. If F_2%LAT_MUL2_H is asserted, the DA3A.VALIDR bit will not be passed to 
stage 3, and the multiplier will select fed-back outputs as its inputs. F_2%LAT_MUL2_H is 
unconditionally deasserted one cycle after it is asserted. 

The stage 2 control block also modifies one bit of the internal opcode encoding, 
F_1%F0P_FL0WR__H<1>, before passing it to the stage 3 control if the conditions effective sub 
and exponent difference greater than 1 are detected. It also contains the FZR bit latch and some 
logic to conditionally clear the latch. If an effective subtraction is decoded, and E1ZR XOR E2ZR 
is true, the FZR bit latch will be cleared. If this condition is not true, the FZR bit latch will 
loaded with the FZ output of the DETL in the fraction datapath. 

11.19.28.1 Datapath Control Signals Output from Control Block 

CIX_E_AD 1_H : This is the carry-in to the LSB position of the exponent datapath adder, E_AD1. 
This signal also controls the ones complementing of the exponent bias. If asserted the ones 
complement of the exponent bias is passed to the EB_AD1 output of the exponent complement 
logic. If deasserted, the true exponent bias is passed to the EB_AD1 output unchanged. 
F_2_C%CIN_E-.ADl_His asserted if a MULf instruction is decoded by the stage 2 control. 

F_2_C<£DETL_EN_STKYJL :This enables the DETL to detect conditions for setting the sticky bit 
which is used by the stage 3 fraction adder. This signal is asserted if an effective subtraction is 
decoded and the exponent difference between the operands is greater than one. 

F_2_C < wEJK_H<7> : This signal is an exponent bias which is driven to the INP_1B input of the 
exponent complement logic. This signal is the complement of E_K_H%F_2_C<10>. 

F_2_C%E_K_H<10> : This signal is an exponent bias which is driven to the INP_1B input of the 
exponent complement logic. This signal is asserted if the F_1_C%DST_DTR_H<2:0> decodes to 
G datatype. 

F_2_C%E1Z_E2Z_H : If this signal is asserted, the L1DETLO<BO> bit will be set (which indicates 
the contents of the F_1%FD1R is a normalized number) independent of the other inputs to the 
L1DETL. This signal is asserted if (F_1%E1ZR OR F_1%E2ZR) AND (effective sub) is detected. 

F_2_C%FD1SELJPASS_0_L : If this signal is asserted then the FD1SEL will pass zeros 
to it's outputs and the stage 2 FD1R will load in all zeros. This signal is asserted if 
F_1_E%EDIFF_GTR_57_H is asserted, and ADDf or SUBf or CVTfi or CVTRfi is decoded. 

F_2_C%LSSEL_PASS_SDECOR_H : If this signal is asserted F_2_P%SDECOR_H is passed 
to the LSSEL output, F_2%LSSELO_H. If deasserted, F_2__L%L1DETL0_H is passed to 
F_2_L%LSSELO_H. LSSEL_PASS_SDECOR_H%F_2_C is asserted if a CVTfi or CVTRfi 
instruction is decoded, or if an effective subtraction and exponent difference greater than 1 is 
detected. 

F_2_C%MSEL_PASS_FB_H : If this signal is asserted F_2_M %MCR_L is passed to the MSEL 
carry output, MCSELO, and the F_2JM%MSR_L is passed to the MSEL sum output. If 
deasserted, the MTCR%_1 is passed to MCSELO, with zeros in the vacant bit positions, and 
the F_1%MIPPR is passed to MSELO. F_2_C%MSEL_PASS_FB_H is asserted if the internal 
signal F_2_C%LAT_MUL2_H is asserted. 
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F_2_C%0SEL1_E_AD1_H / F_2_C%0SEL1_ED1R_H : The 0SEL1_E_AD1_H and the 
0SEL1_ED1R_H signals are complementary signals. If OSELl_E_ADl_H is asserted, the 
exponent output selector, OSEL1, passes E_AD1, E_ADl_OVF and E_AD1_UNF to its outputs. 
If 0SEL1_ED1R_H is asserted, ED1R, ED1R_0VF and ED1R_UNF are passed to the OSEL1 
outputs. 0SEL1JE_AD1_H is asserted if a MUL or DIV is decoded by the stage 2 control. 

F_2_C%OSELl_ZEROS_L : If this signal is asserted the exponent output selector, OSEL1, passes 
zero to its output. If deasserted, zero is not passed to the OSEL1 outputs. This signal is asserted 
if a MUL or a DIV is decoded, and F_1_E%E1ZR_H or F_1_E%E2ZRJH is asserted. 

F_2_C%SET_STKYRJH : If this signal is asserted the STKYR is forced to 1, independent 
of the state of the F_1%FD1R and the SDECOR. If deasserted, the state of the STKYR 
depends on the instruction flow and the data. SET_STKYR_H%F_2_C is asserted if 
F_1_E^E_DIFF_GTR_57R_H AND N0T(F_1%E1ZR OR F_1%E2ZR%) is true. 

F_2_C%FORCE_SHFT_0_H : This signal forces the RSHIFT to pass the FD1R%_1 to its output 
unshifted. If this signal is deasserted, then the RSHIFT shifts the F_1%FD1R by the number of 
bit positions decoded by the SDEC. This signal is deasserted if an effective sub is decoded and the 
F_1_E%EJDIFF_GTR_1RJH is asserted, or if an effective add is decoded, or a CVTfi or a CVTRfi 
is decoded and F_1JE%E_NR_H is low. 
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11.19.29 Stage 2 Fraction Datapath Operation Summary 

The following tables summarize the operation of the stage 2 fraction and exponent datapaths. 
Figure 11-22: Fraction Datapath Operation Summary 
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Figure 11-22 Cont'd on next page 
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Figure 1 1 -22 (Cont.) : Fraction Datapath Operation Summary 



I 




Conditions 








Fraction Datapath Registers 


and Outputs 




1 


t OPC | 


EFF A/S 


I E1Z 


E22 


E_N 


ED IFF 


M2 | 


FD1R 


| FD2R 


F_ZR 1 


LSHR 


1 STKYR | 


MC/SR 


MILSBC/SR | 


I I 

I MULFI 


X 


1 o 


0 


X 


X 


0 1 


FD1R 1# 


I FD2R_1## 


1 

UD | 


UD 


1 1 
1 UD I 


C/S 


UD I 


! MULF | 


X 


| 1 


0 


X 


X 


0 1 


FD1R 1* 


I FD2RJL* 


UD | 


UD 


1 UD I 


0 


UD i 


I MULFI 


X 


1 o 


1 


X 


X 


0 1 


FD1R_1* 


I FD2R~1* 


UD | 


UD 


1 TO I 


0 


UD I 


I MULFI 


X 


I 1 


1 


X 


X 


0 1 


FD1R_1* 


1 FD2R~1* 


UD I 


UD 


1 UD I 


0 


UD I 


! MULFI 


X 


1 x 


X 


X 


X 


1 1 


TO 


I UD 


UD I 


UD 


1 UD I 


UD 


UD | 


IMULDG | 


X 


1 o 


0 


X 


X 


0 1 


FD1R 1# 


I FD2R_1## 


UD | 


UD 


1 UD 1 


UD 


UD | 


IMULDG | 


X 


I o 


0 


X 


X 


1 1 


FD1R_1# 


I FD2R~1## 


UD | 


UD 


1 UD 1 


C/S 


UD | 


IMULDG | 


X 


I a 


X 


X 


X 


0 1 


FD1R 1* 


I FD2R~1* 


UD | 


UD 


1 UD I 


UD 


UD ! 


IMULDG I 


X 


1 x 


i 


X 


X 


0 1 


FDiR 1* 


I FD2R 1* 


UD | 


UD 


1 UD I 


UD 


UD i 


IMULDSI 


X 


I i 


X 


X 


X 


1 1 


FD1R~1* 


I FD2R~1* 


UD | 


UD 


1 UD | 


c 


UD | 


I MULDG ! 


X 


1 x 




X 


X 


1 1 


FD1R~1* 


1 FD2R~2* 


UD I 


UD 


I UD 1 


0 


UD ! 


I K?LL: 


X 


1 x 


X 


X 


X 


0 1 


FD2R_1± 


I FD2R~1** 


Vh | 


UD 


1 UD I 


UD 


UD 


1 MULLI 


X 


I x 


X 


X 


X 


I 1 


FD1R~1* 


I FD2R~1## 

! 


UD I 

i 


UD 


1 UD I 


C/S 


1 



C?C - OrCwDE 

ZTT P.'S - Der.ctes that the operation is an effective ADD cr SUB. 

£D - Ii:p;ner.t dif ferer.re. 

Ml - Deletes the second execution cycle cf a MUI instruction in stage 2, 

C/S - Den:tes valid Carry and Sum vectors. 

?. L-_-_--; - Den:~es that the stage 2 right shifter input is 

~ - ~r.d 



Stage 1 forces this register to he all zercs. 
Contains 3 "multiplicand generated in stage 1. 
Contains I'r.ultipli: 



DIGITAL CONFIDENTIAL 



TheFbox 11-61 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 11-23: Stage 2 Exponent Datapath Operation Summary 
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OPC - OPCODE 

EFF A/S - Denotes that the operation is an effective ADD or SUB. 
ED - Exponent difference. 
UD - Undefined 



11.19.30 Passthru Signals 

MMGT_FLT_L, MEM_ERR_L, RSV_ADR_L and PSL_FU_H signals are simply passed through 
stage-2 without change. They are latched coming in from Stage- 1 during PHI_4 and driven to 
Stage-3 during PHI_2. 

NEW_FOP_H signal also passes through to Stage-3 unaffected. It is latched during PHI_1 
coming from Stage- 1 and driven to Stage-3 during PHI_3. This signal is gated with the global 
purge signal F_I%PURGE_H from the input interface which clears it on a PURGE from the 
input interface. This signal is used by the Output Interace to manipulate its control-queue and 
data-queue pointers. 
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11.20 STAGE 3 

11.20.1 Introduction 

Stage 3 of the pipeline is used primarily to left shift an input, or to perform the addition of 
two inputs. Stage 3 contains a control section and portions of the fraction, exponent, and sign 
datapaths. In addition, stage 3 has the capability to bypass stage 4 rounding operation for certain 
instructions. Stage 3 takes virtually all of its inputs from stage 2 of the pipeline, and drives it's 
outputs either to stage 4 or to the output interface directly. 

The fraction datapath portion of stage 3 consists of a left shifter, an instantiation of the generic 
adder and some mini-rounding incrementers. The left shifter is used for convert and effective 
subtraction-like operations. The adder is used by all other operations either to pass an input 
to the output (by adding zero), or to add two vectors— for example, the two input operands 
(correctly aligned) for addition/subtraction, or the sum and carry vectors for multiplication. 
The mini-rounding incrementers are used to round the fraction result during a stage 4 bypass 
operation. Stage 3 also performs the injection of the sticky bit and increments the quotient, 
dependent on the sign of the remainder. The output of stage 3 is always normalized, where 
relevant. 

The exponent datapath consists of the generic exponent block. In this stage, the input selector, 
adder, and output selector are primarily used. For addition, subtraction, multiplication, and 
division, the adder is used to increment/decrement the input exponent according to whether the 
fraction addition can overflow/underflow. It also subtracts the left shift amount when the fraction 
portion performs a left shift. 

The sign datapath portion in stage 3 will generate the correct sign for the result during a 
successful stage 4 bypass. No operation is performed on the sign bit that is sent to stage 4. 

Some integer overflow detection logic is included in the control path. Additionally the six LSB's 
generated for MULL are combined, and a few stage 4 signals are generated. 

1 1 .20.2 Stage 4 Bypass 

For a specific set of instructions and conditions, stage 3 can supply a result to the output interface 
directly. This is referred to as a "stage 4 bypass" and improves Fbox latency by supplying a 
result one full cycle earlier than the stage 4 supplied result. In order to bypass stage 4, stage 
3 must perform the required operations that stage 4 would normally perform under the same 
conditions. This includes rounding the fraction, supplying the correct exponent and generation 
of the condition codes and status information that is related to the result for floating ADD, SUB 
and MUL instructions. 

Stage 3 performs the rounding operation through the use of incrementers. These incrementers 
are much smaller in width than the number of fraction bits for a particular data type due to 
timing constraints. Because of the limited size of the incrementers not all fraction datums can 
be correctly rounded by stage 3. (The mini-round succeeds if the selected incrementer for a 
bypassable instruction does not generate carry out.) If the mini-round fails, the unmodified 
fraction is driven and the stage 4 bypass is aborted. 

Stage 3 and stage 4 share common busses to drive results to the output interface. Stage 4 will 
drive the busses, during phi 3. if it has a valid data. Stage 3 will drive the busses, during phi3, 
if it can successfully bypass an instruction and stage 4 does not have a valid data. 
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The stage 3, stage 4 common busses are listed below: 

f_b%f_out_l<b0:b55> fraction result bus 
f_b%e_out_l<10: 0> exponent result bus 
f_b%s_ouz sign result bus 
f_b%n_out psln result bus 
f_b%z_out pslz result bus 
f_b%v_out pslv result bus 



11.20.2.1 Stage 4 Bypass Request 

When stage 3 has detected that a stage 4 bypass may be possible it signals the output interface 
by asserting the signal F_S%S4_BYPASS_REQR_H during phi4. 

All of the following conditions must be met in order to generate a stage 4 bypass request. 

o The signal F_I<7cS4_BYPASS_ENB_H must be asserted. 

c The FZ~ <«e:f zne cf zhe fallowing czndizlzns. 



• . SVB ?,D,S ( excepz uses ediff-C AKD frizzier, reszlz is negazive ) 

s The signal F_2^DATA_VALIDR_H is asserzed indizazir.g zhaz zhe daza present 
s~ szage S's ir.puz is valid. 

z The signal F_3^T) ATA_V AT TO R_H ± s kct asserzed Lr.dizs.zLr.z- zhaz a zes-lz 
was r.zz ser.z zz szagm * Lr. zhe pre-rizxs cyzle. 

o There are nc iaxlzs asszciazed with zhe daza. ( zzgz_flz, mest_err r rsv_adz } 

o Seizher of the zvo input operands are reserved operands. 

11.20.2.2 Stage 4 Bypass Abort 

In order to abort a stage 4 bypass, the signal F_3%S4_BYPASS_ABORTR_H must be asserted during 
phi2. Either of the two following conditions must be met in order to abort a stage 4 bypass 
assuming the bypass request was generated. 

o Mini-round failure. The selected mini-round lncrementer carried out of it's 
most significant bit position. 

o Exponent overflow or underflow Is detected on either of the two exponent 
results In stage 3's exponent section. Irrespective of the possible 
1-blt left or right shift required for the fraction adder result. 

11.20.2.3 Stage 3 Response to FBOX Purge 

Stage 3 responds to the FBOX purge by dealing from stage 3, the data_valid flag and also the 
new_fop flag. 

11.20.3 Section Implementation Description 
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1 1 .20.3.1 Block Diagrams 

Stage 3 is made up of three sections: control, fraction, and exponent. On the following pages, 
block diagrams of the fraction and exponent datapaths are shown. 
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Figure 11-24: Stage 3 Fraction Datapath Block Diagram 
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Figure 11-25: Stage 3 Fraction Mini-round Block Diagram 
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Figure 11-26: Stage3 Exponent Datapath Block Diagram 
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11.20.4 Fraction Datapath 

The operations performed in the fraction datapath in this stage are shown in the following table. 



Table 11- 


11: Stage 3 Fraction Datapath Operations 




Category 


Operation 


Condition 


FO 


LSHFT.OUT <- FD1R.SHL.[LSHR] 


EFF.SUBf, deltaE < 2, neither operand s 0; 

TTT'I* J* ^ITTTIVe *1 /»= V » /TT fill! tJT 1 f*i_ T_ • r\ 

CVTif; CVTfi., left shift; CVTRfL, left shift 


Fl 


LSHFT.OUT <- FD2R.SHL.[LSHR] 


EFF.SUBf, deltaE < 2, operand(s) = 0 


F2 


SUM <- FD2R + FD1R 


EFRADDf; CVTff; MOVf; MNEGf; CMPf; 
TSTf; CVTfi, right shift 


FS 


SUM <- FD2R + .not-FDIR + .not.STKYR 


EFF.SUBf, deltaE > 1 


F4 


SUM <- FD2R + FD1R + Rnddi 


CVTRfL, right shift 


F5 


SUM <- FD2R + FD1R + .not.F_NR 


DIVf 


F6 


SUM <- .not.MCR + .not.MSR 


MULf; MULL; 


11.20.4.1 


Normaiizer Input Selection 





The data to be left-shifted may be contained in F„2 < 7cFDlR_H or F_2%FD2R_H. The normaiizer input 
selector is used to select between these two input registers. 



11.20.4.2 Left Shifter 

The left shifter is capable of performing zero to fifty-seven bit left shifts. The shift amount is 
driven on the LSHR lines in decoded form. The output of the left shifter is driven on LSHFT_OUT 
to the stage 3 output selector. For effective subtraction, exponent difference equal to zero, the 
output of the left shifter may be negative. The shift amount is forced to "shift of zero" if stage 3 
is in FBOXJTest mode or the chip is reset. 
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1 1 .20.4.3 Adder Input Selection 

The adder is driven with two input vectors: AIN and BIN. AIN can be FD2R or MSR; BIN can 
be FD1R or MCR. Note that for several operations, either FD1R or FD2R must be zero; the data 
is contained in the other register. 

These operations are: 

CVTff 

MOVf 

MNEGf 

CMPf 

TSTf 

CVTfi, right shift 
CVTRfL, right shift 
DI\ r f 

11.20.4.4 Adder 

The adder uses two 61-bit inputs to derive a 62-bit result. The 61-bit inputs have two bits above 
the binary point and 59 bits below; the 62-bit result has an extra bit above the binary point. In 
this stage, the most significant bit of each input is not used; neither are the two most significant 
bits of the output. 

The main carry acceleration technique used is carry select. The adder is broken up into nine 
small groups, with all but the least significant group having duplicate carry chains. These carry 
chains operate in parallel during the early part of the execute cycle. Propagate and generate 
logic operates before the carry chains. These parts of the adder are fully static. 

During the late part of the execute cycle, the sum logic executes. Just as for the carry logic, there 
is duplicate sum logic for all groups except the least significant one. In addition, logic to derive 
the true group carry out signals executes in these phases. These carry out signals are used to 
select the correct sum values. These parts of the adder are also fully static. 



NOTE FOR MULL: 

The adder in stage 3 adds the 58 MSB's generated by the multiplier array. <B58> of 
AIN and BIN is forced to zero for multiply operations. 

Shift Detection Logic: 

The most significant group of adder bits, bit positions <A2:B1>, is different from the groups below 
it. In this group, both the carry and sum logic execute during the early part of the execute cycle. 
Late in the execute cycle, shift detection logic executes. If enabled, it examines the sum bits 
<A0:B1> to determine whether a one bit shift right or left is needed to normalize the result. The 
possible values of sum bits <A0:B1> are given in the table below for each operation which may 
yield a non-normalized adder result. 
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Table 11-12: 


Possible Values For Sum Bits <A0:B1> 


Result #1 


Result #2 


Result #3 


Operation 


0.1X 


lXX 


0.00 


EFRADDf 


0.1X 


0.01 


0.00 


EFF.SUBf, deltaE>l 


0.1X 


0.01 


0.00 


MULf 


0.1X 


1XX 


0.00 


DIVf 



If the shift detection logic is disabled, then the signal indicating "no shift needed" will be asserted. 

This logic is also conditioned with another signal (sel_other) which is used to deassert all of the 
shift detection signals. Since these shift detection signals are used to drive the output selector for 
the stage, this feature permits the selection of a stage output other than the shifted or unshifted 
adder result. 

The logic used to control the shifting is as follows: 




Det_shrl detects the case in which the fraction result is 1XX.XX, and thus the fraction must be 
shifted right by one bit to be normalized. Det_pass detects several cases: first, the case in which 
the fraction result is 0.1XX. XX; second, the case in which the fraction result is zero (0.0QXX.XX); 
last, the case in which shifting is disabled. Det_shll detects the case in which the fraction result 
is 0.01XX..XX, and the fraction must thus be shifted left by one bit to be normalized. 

The detection logic is duplicated, with one copy for each of the two sets of sum bits. This logic 
is fully static. The correct shift signals are selected dynamically by the true group carry out of 
the previous group, and driven out of the adder. A signal indicating whether a shift was done is 
driven to the exponent section, where it is used in selecting the proper exponent output. 

Bit Injection Within Adder: 

The adder performs rounding and two's complementing for all datatypes. The following table 
shows the bit positions into which injection is done. The bit positions are denned as c(y), meaning 
the carry in of the yth bit position. This carry in is derived by forcing a carry out to be generated 
in bit position (y-1). Only Rnddi is used in this stage. 
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Table 11-13: Bit Injection Within Adder 

Type of Injection 

Rndf Rnddi Rndg Cinb55_one 

c(B24) 

c(B56) 

c(B53) 

cCB55) 



The carry in to bit position <B58> is set directly by the stage's control section. 

11.20.4.5 Mini-Round incrementers 

These incrementers are used to round the fraction result supplied by either the left shifter or the 
adder. The incrementer for D and G type is four bits wide while the incrementer for F type is 
three bits wide. 

1 1 .20.4.6 Output Selector 

The output selector is a precharged l-of-4 selector. It selects either the left shifter output or the 
adder output (shifted left one bit position, passed unshifted, or shifted right one bit position). 
Three of the four selector control signals (the three adder output selection signals) are driven 
from the adder to the output selector; the fourth (the left shifter output selection signal) is driven 
from the control section. 
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11.20.4.7 Fraction Datapath Operation Summary (Normal Operating Mode): 



Figure 11-27: Fraction Datapath Operation Summary 
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OPC - Opcode 

EFF A/S*- Effective Addition (A) or Effective Subtraction (S) 

SUM - Adder Output, shifted left/passed unshif ted/shifted right as needed 

ED - Exponent Difference 

V - Valid data 

x - Don't care 
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11.20.5 Exponent Datapath 

The operations performed in the stage 3 exponent datapath adder are shown in the table below. 
Note that the exponent operation category numbers are unrelated to the fraction operation 
category numbers. 



Table 11-14: 


Exponent Datapath Operation Summary 




Category 


Operation done in Adder 


Condition 


E0 


NOOP 


CMPf TSTf 


El 


E_AD1 <- ED1R + K + 1 


DIV$ EFRADDf 




( ER + 0 + 1 ) 




E2 


E_AD1 <- ED1R + .not.K 


MULf; MULL; EFF.SUBf, deltaE>l 




( ER - 1 ) 




E3 


E_AD1 <- ED1R + .not.ED2R + 1 


EFF.SUBf, deltaE<2 




( ER - NORM ) 




E4 


E_AD1 <- K+ .not.ED2R + 1 


CVTif 




( BIAS1 - NORM ) 




E5 


E_AD1 <- ED1R + .not.K + 1 


CVTfi, CVTRfL 




( ER - BIAS2 ) 




E6 


E_AD1 <- ED1R + K 


CVTff, MOVf, MNEGf 




( ER + BIAS3 ) 





11.20.5.1 Constants 

Five bits (bits <BITMAP>(10), <BITMAP>(7), and <5:3>) of the exponent constants are driven 
from the control section into the exponent section. The other eight constant bits are hardwired 
to ground within the exponent block. The constants needed in stage 3 are: 



KO 




0000000000000 


0 








1111111111111 


-1 NOT(KO) 


Kl 




0000010100000 


160 


CVT1B,K,I,} {F,D} 


K2 




0010000100000 


- 1056 


CVT{B,W,L>G 


K3 




0000000011000 


24 


CVT{F,I>,G}L/CVTR{F,D f G}L 


K4 




0000000101000 


40 


CVT{F,D f G}W 


K5 




0000000110000 


48 


CVT{F,D,G}B 


K6 




0000010000000 


128 


CVT{D,G}F/CVTFD 


K7 




0010000000000 


- 1024 


CVTFG 



Kl and E2 are the BIAS1 constants, used in CVTif; K3, K4, and K5 are the BIAS2 constants, used 
in CVTfi and CVTRfL; K6 and K7 are the BIAS3 constants used in CVTff, MOVf, and MNEGf. 



11.20.5.2 Zero Detection 

The zero detectors are not used in stage 3. 
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11.20.5.3 Exponent Adder 1 

The exponent adder is a 13-bit static adder used to add or subtract two inputs. Each input is 
passed through a 2 to 1 selector and inversion logic prior to the adder. 

INP_1A can be selected from ED1R or K. If ISEL1_ED1R_A is asserted, then ED1R is passed 
through the selector. If ISEL1_K_A is asserted, then K is passed through the selector. Inversion 
of the adder input is then done based on the assertion of INVERT^JSA^ADl. INP_1A is never 
inverted in this stage. 

INP_1B can be selected from ED2R or K If ISEL1_ED2R_B j s asserted, then ED2R is passed 
through the selector. If ISEL1_K_B is asserted, then K is passed through the selector. Inversion 
of the adder input is then done based on the assertion of INVERT_EB_AD1. 

The adder also contains a carry-in to the LSB cell, CIN_E_AD1_H. The carry-in is primarily used 
for performing subtraction operations. The table below gives the carry value for each exponent 
operation category: 

Table 11-15: LSB Carry-In Values 

Category Carry In 

E0 d 

El 1 

E2 0 

ES 1 

E4 1 

E5 1 

E6 0 



Since the adder is static, it begins its operation when the input data is valid near the falling 
edge of phase 2. Intermediate results in the exponent adder are valid by the middle part of the 
execute cycle and sent to the detection logic and output selector. 

11.20.5.4 Output Selector 

The output selector is used to select the output data from three different sources: edlr, e_adl or 
zero. This selection is done for the exponent output data (ED1R), the floating overflow (F_OVFR) 
and the floating underflow (F_UNFR). The selection is based on the assertion of two control 
signals, OSELl_ZERO and OSEL1JE-AD1. 

0SEL1_E_AD1 if asserted, selects the output from E_AD1; for overflow and underflow, 
OSELl_E_ADl selects E_AD1_UNF and E_ADl_OVF. If 0SEL1_E_AD1 is deasserted, then the 
output is selected from ED1R; for overflow and underflow, OSEL 1_E_AD 1 deasserted selects 
ED1R.OVF and ED1R_UNF. This selection is done using a 2 to 1 selector. 

The selection of zero is done prior to the 2 to 1 selector described above. If OSELl_ZERO is 
asserted, then the inputs from E_AD1 and ED1R entering the 2 to 1 selector are both forced 
to zero. Then, since only one select line is used to control the selector, the zero value will be 
transferred to the output regardless of the assertion of OSELl_E_ADl. 

The output of the selector is latched every cycle and driven into the following stage. 
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The selection of the exponent output is shown in the following table. 



Table 11-16: 


Exponent Output Selection 












\j uv9xr«» Uvii 


Select ED1R 


Select E_ADl 


JC* UX vv lid VB 




if: 


if: 


if: 


E0 


always select 






El 


fraction passed 


fraction shifted 


DIV: (elzr + e2zr) 




unshifbed 




EFRADDf: (elzr * e2zr) 


E2 


fraction passed 


fraction shifted 


MUL: (elzr + e2zr) 




unshifbed 




EFF.SUBf: (elzr * e2zr) 


E3 




always select 


EFF.SUBf. deltaE=0: (f_zr) 








EFF.SUBf: (elzr * e2zr) 


E4 




always select 


(f_zr) 


E5 




always select 




E6 




always select 


(elzr) 



As shown in the table above, some selection operations are dependent only on the operation 
category, while others also depend on whether the fraction adder result needed a one bit 
normalization. The control section implements the following equation: 



CSEL2_E A£2 '2 If GSEZ2 E AD2 

CSELljT&Dl • 0 If S5£TX:~£5li? 

0SEL2_E_AD1 - SHFT_DONE If (GSEL2_£_AD * GSZZ2_ED2R) + RESET 

GSEL 1_E_AD 1 and GSEL1_ED1R are generated in the control section, based on the opcode. 
SHFT_DONE is generated in the adder, based on the value of the adder output. If RESET is 
asserted, SHFT_DONE selects the exponent output. 

The overflow and underflow outputs that are selected are never used. 
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11.20.5.5 Exponent Datapath Operation Summary (Normal Operating Mode): 
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11.20.6 Sign Datapath 

The operation done in the sign datapath portion of stage 3 is shown in the table below. 



Table 11-17: Stage 3 Sign Datapath Operatlons/sign_dp_oper 



Category 


Operation 


Condition 


SO 


f_3%slr_h <- f_2%slr_h 


always performed 




f_3%s2r_h <- f_2%s2r_h 




SI 


f_b%s_out_l <- f_3%bp_plsn 


performed during stage 4 bypass 


11.20.7 


Control 





The control section generates all the control signals needed for stage 3, based on the opcode and 
several condition signals, such as E1ZR and F_ZR. It sends the opcode and necessary condition 
signals to stage 4. In addition, it contains some integer overflow detection logic, a 6-bit adder 
used in MULL, and logic to generate some control signals needed by stage 4. 
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The following table shows which categories of operations are performed in the fraction, exponent, 
and sign datapath portions of stage 3 for each opcode. Each category indicates a unique set of 
control signals to be driven. The control section generates these combinations of categories. 



Table 11-18: Categories of Datapath Operations 



Opcode Operation 


Categories of Datapath Operations 




Fraction 


Exponent 


Sign 


CVTff, MOVf, MNEGf 


F2 


E6 


SO 


CVTif 


F0 


E4 


SO 


CVTfi, right shift 


F2 


E5 


so 


CVTfi, CVTRfL: left shift 


F0 


E5 


so 


CVTRfL, right shift 


F4 


Eo 


so 


DIVf 


Fo 


El 


so 


EFRADDf 


F2 


El 


so 


EFF.SUBf, deltaE<2, opnds <> 0 


FO 


E3 


so 


EFF.SUBf, deltaE<2, opnd(s) = 0 


Fl 


E3 


so 


EFF.SUBf, deltaE>l 


F3 


E2 


so 


CMPf; TSTf 


F2 


EO 


so 


MULf, MULL, 


F6 


E2 


so 



11.20.7.1 Miscellaneous Control Signals 

Most of the stage 3 control signals are generated in the control decoders, but some are generated 
or conditioned external to the decoders. These signals are described in this section. 

11.20.7.2 Data_Valid 

The data_valid signal sent to stage 4 is received from stage 2 and is enabled when there is no 
FBOX flush occurring and a stage 4 bypass is also not occurring. The equation for enabling 
F_3_C%3_DAIAJVALEDR_H is as follows: 

f_3 %s 3_dv_enb - NOT f_i%abort_h AND (f_3%s4_bypass__abortr_h OR 

NOT ( f_3*s4_bypass_anb AND f_3%s4_bypass_raqr_h )) 

This operation is performed before the end of the execute cycle. 

11 .20.7.3 Fault Bits and NEWJOP 

There are three fault signals associated with each valid data that flows through the FBOX pipe. 
In addition to these three fault signals there is one more signal (new_fop) which indicates that 
there is a new FBOX operation is coming through the FBOX pipe. The three fault signals are 
named F_3%MMGT_FLT_L, F_3%MEM_ERR_L, F_3%RSV_ADR_L. A stage 4 bypass request can not be 
generated if any of the fault lines are asserted. The new_fop signal is cleared out of the FBOX 
pipe whenever an FBOX purge occurs. 
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11.20.7.4 Signs__Not_Eql, Fb_Neg4 

Stage 3 generates two signals for use in stage 4. These signals are signs_not_eqlr_h and 
fb_neg4r_h The equation for signs_not_eql is: 

SIGNS_NC7_EQL - SI XOR S2 

FB_NEG4 is the signal used to negate the B input to the stage 4 fraction adder. The input is 
negated if stage4 needs to perform a two's complement. The equation implemented is: 

FS_NEG4 - I (EFFSUE * E_DIFF_EQL_0 * F_2%F_N) + 
(CVTFI * SIR) + ~ ~ 

(sj£>is_xct_eqz ; ; * F_nrsox_syFAss m _3 

11.20.7.5 Integer Overflow Logic 

Some of the logic used to detect the integer overflow condition for CVTn and CVTRfL is located 
in stage 3. This static logic operates unconditionally, and its outputs are used by stage 4 when 
needed. 

The first function is IOVFL3. It implements the equation 



(DESTD T<W0RD> * iWSGtlR) + 
(DESTD T<L ONG> * MNEGLR) ] 

IOVFL3 detects integer overflow for CVTfi and CVTRfL (no round up), in the case where the 
hidden bit of the fraction becomes the MSB of the integer, and the sign is negative. In this case, 
a two's complement must be performed on the integer. If the integer is 100.. .00, no overflow 
will occur since the result of the two's complement will be 100.. .00, a negative number. This 
happens because in N bits, more negative numbers (one more) can be represented using two's 
complement than positive numbers. Thus, there is no positive equivalent of the most negative 
number (100...00). If the integer is not 100...00, overflow will occur since the result of the two's 
complement will be 0XX...XX, a positive number. 

The second function is IOVFL4. It implements the equation 

IOVFL4 <— (IOVFL4A + 20VFL4B) * CVTRfL 

IOVFL4A < — LNEGXR * SIR * E_DIFF_EQL_ 25R 

IOVFL4B <— SIR * E_DIFF_EQL_24R * CRFL_RNDR * MNEGLR 

IOVFL4 detects integer overflow for CVTRfL, in the case when rounding causes the integer to 
be incremented. IOVFL4A. detects the case where the integer is 011... 11, the result should be 
positive, and a round up occurs. IOVFL4B is used to detect a case not covered by IOVFL3. In 
general, if the hidden bit of the fraction becomes the MSB of the integer and the sign bit is 
negative, overflow will occur unless the integer is 100...00. However, for CVTRfL, overflow will 
also occur for an integer equal to 100...00 if the integer must be rounded up. IOVFL4B covers 
this case. 

IOVFL3 and IOVFL4 are sent to stage 4, which calculates the final integer overflow result. 
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11.20.7.6 Cln__B58 

The carry in to bit position <B58> of the fraction adder is generated outside the control decoders, 
using control signals generated by the decoders. (See Fraction Datapath Operation Summary.) 

CIN_B58 " F_1*F_N If operation Is DIVIDE 
CIN_E5S - STKY If operation is EFFSOB, EDIFF>1 
CII!_B5S - 0 otherwise 

The decoders generate the signals indicating the operation type. 

11.20.7.7 Sel_Other 

The sel_other signal is used in the adder and output selector in order to permit selection of the 
normalizer output as the stage output. For all operations except CVTfi, the value of this signal is 
determined by the operation. For CVTfi, it is determined by the sign of the exponent difference 
obtained in stage 1. If the exponent difference is negative, a left shift is performed on the fraction, 
and stage 3 must select the normalizer output. If the exponent difference is positive, a right shift 
is performed, and stage 3 selects the adder output. (See Fraction Datapath Operation Summary.) 
Finally, the normalizer output is always selected in FBOX_Test mode and when the chip is reset. 
The equation implemented is the following: 



(CVTFI * F_3tS5L_0THSP._B) + 
F_I*FBDX_S1-PASS_B + 
ZZSF.T 

The control decoders generate the signal F_3_C%SEL_OTHER_H, used for all operations except 
CVTFI. 

11.20.7.8 Left Shifter Input Selection Signals 

There are two left shifter input selection signals: F_3%LSHFT_FD 1R_H and F_3%LSHFT_FD2R_H. 
Either F_2%FD1R or F_2%FD2R may hold the input to be left-shifted. (See Fraction Datapath 
Operation Summary.) F_2%FD2R holds the input if the operation is effective subtraction, with 
either input equal to zero. For all other operations, F_2%FD1R holds the input to be shifted. The 
equations implemented are the following: 



LSEFT_FD1R - [F_I%FBOX_BYPASS_E * (ETFSVB * (E12 + E2Z) ) ] + 

[F_I*FBOXJBYPASS_B * F_ I%S4_B YPASS_ENB_E J 

LSBFTJFD2R * [F_ I%FB OX_B YPASS_E * (EFFSOB * (E1Z + E2Z) ) ] + 
[F_I%FBOX_BYPASS_B * F_I*S4_BYPASS_ENB_EJ 



11-S0 TheFbox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



11.20.7.9 OseMJZero 

This signal is used to force the stage 3 exponent output to zero. The equation implemented is as 
follows (see description of the output selector in the exponent section): 



OSEL2_ZERO - { ID IV * (E2Z + E2Z) ] + 

[MUL * (E1Z + E2Z) ] + 

[ETTADD * (E1Z * Z2Z) ] + 

fEFTSVB * (E1Z * E2Z) ] + 

[ETTSUS * E_DITT_EQL_D * F_Z] + 

[CVTlf * F_Z] + 

[CVTff * E1ZJ + 

IMOVf * E1ZJ + 

[MNESf * E1Z] ) * T_I*TBOX_EY?ZSS_H 



11.20.7.10 Osel1_Ed1r 

This signal is used to select the stage 3 exponent output. If it is asserted, the contents of 
F_SG2%ED1R are chosen as the stage output; otherwise, the exponent adder output is chosen as 
the stage output. 




OSELl ED IP. - 2 if f C?f?r + 

ZSTf + 



T I%TBCX EYFASS B ] * PZSEZ 



3SEZ.2_ED1P - SSTI_DOKE if [ (EFTSUS * E_DITT_QTP_1 ) 

ETTADD + 
MVL + 



DIV ] * 5 X%FBOXJBXPASS_a + 

RESET ~ ~ 



11.20.7.11 MULL Adder 

The multiplier array in stage 2 generates 64-bit sum and carry vectors for MULL. The 58 MSB's 
are combined in the fraction adder in stage 3. The 6 LSB's (<B58:B63>) of each vector must be 
added together in the control section of stage 3. The six sum bits generated are sent to stage 4 
(as are the MSB sum bits). Any carry out of the six LSB's has been previously incorporated in 
the MSB's in stage 2. 



11.21 STAGE 4 

Stage 4 of the pipe is used to do various terminal operations of an instruction. It does round 
or a 2's complement on the result of stage 3. The result of stage 4 is the final result which is 
sent to the interface section. Stage 4 finds the sign of the final floating result and outputs it to 
the interface. Stage 4 also detects the following conditions: integer overflow, floating overflow, 
floating underflow, zero result, negative result, reserved operand and floating divide by zero. 
In addition to this, it sets the correct condition codes (PSL.Z AND PSL.N). Stage 4 also checks 
whether the condition for CMP and TST instruction is met or not. For CMP, the correct condition 
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codes are set. During any CMP instruction, stage 4 forces the fraction and exponent datapath 
output to zero. When reset is asserted only one path of selector will be enabled in the fraction 
adder selector logic. 

11.22 FRACTION DATAPATH 



Figure 11-28: Fraction Datapath Block Diagram 
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11.22.1 Fraction Implementation Description 

FRACTION DETECTION LOGIC 

The detection logic in the fraction datapath is connected directly to the output from stage 3. The 
F_3%FD1R_H and F_3%MILSBR_H outputs from stage 3 are necessary for the detection logic. 
The detection logic works unconditionally and no control signals are provided to the logic except 
clocks. The detection logic detects a zero result for MULL and CVTfi instructions. It also detects 
overflow for the MULL instruction. Overflow for MULL occurs when the 32 msb's of the 64-bit 
result are not equal to the sign extension of the low half (32 lsb). 

SELECTOR 

The selector drives the selected input into the adder. The selector either selects F_3%FD1R_H 
unshifted or shifted left by eight bits. It can also negate the selected input. The control 
input to the selector is SEL_MULL_L, SEL_MULL_H, FB_NEGR_H, and FB_NEGR_L. If 
the SEL_MULL_H is high then it is a MULL instruction and the F_3%FD1R_H and 
F_3_C%MILSBR_H is selected, shifted left by eight bits. If SEL_MULL_H is low, then the 
F_3%FD1R_H is selected without any shifting. If FB_NEGR_H is high then the selected input is 
complemented. The complementing is necessary for doing a 2's complement if certain conditions 
are satisfied for EFFSUB and CVTfi instructions. 

ADDER 

The adder is used for the terminal operation of the result, i.e. for rounding, to find the 2 5 s 
complement of the result and to add zero to the input. The last case is used when the input to 
stage 4 is to be passed as output of stage 4. The adder also drives the result selection signals. 
One input (FB) to the adder is F_4_A%BIN_H and the other input (FA) is always zero. The 
RND* CIN_B58 and CINB55_ONE signals are driven to the adder by the control of stage 4. 

SHIFT DETECTION LOGIC OF ADDER 

If enabled, the adder examines the sum bits <A0:B1> to determine whether a one bit shift right 
is needed to normalize the result. The instructions which may require a one bit right shift are: 
EFFADDf, EFRSUBf, MULf; DIVF, CVTif and CVTff. For all these instructions the result from 
stage 4 fraction adder could be of the form 0.1XX.., 0.00..., or 1.XX.. . 

If the shift detection logic is disabled, then the signal indicating "no shift needed" will be forced 
valid. This logic is also conditioned with another signal, which is used to force all of the shift 
detection signals to their invalid value. Since these shift detection signals are used to drive the 
output selector for the stage, this feature permits the selection of a stage output other than the 
shifted or unshifted adder result. 

The logic used to control the shifting is as follows: 

f_*_a%det_shrl_h - AO * shift_en * sel_other 

f_*_a%det_pass_h - { [ (A0*B0 + A0*B0*B1) * shift_en] + "shiftmen) * «el_other 



DETJ3HR1 detects the case in which the fraction result is 1.XX..XX, and thus the fraction must 
be shifted right by one bit to be normalized. DET_PASS detects several cases: first, the case 
in which the fraction result is 0.1XX..XX; second, the case in which the fraction result is zero 
(0.00XX. JCX); last, the case in which the shifter is disabled. 
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1-BIT RIGHT SHIFTER 

The input to the 1-bit shifter is the adder result. The output of the 1-bit shifter is the adder 
result unshifted (RESN) and the adder result shifted right by 1-bit (RESS). The 1-bit shifter 
works unconditionally. The shifter is used to right shift a fraction overflow result in case of 
fraction overflow for floating point instructions. If fraction overflow has occured then the shifted 
result is used, otherwise the unshifted result is used. 

RSELECTOR 

The RSELECTOR selects the final result for an instruction. The output of the selector is latched in 
PHI_2 which is passed to the interface. The inputs to the RSELECTOR are the two outputs from 
the 1-bit shifter and zero. For the CMP instruction and for floating destination type instructions 
if the final result is zero then it selects zero. For all other instructions the selector selects the 
1-bit right shifter output (RESN or RESS). 

BUS DRIVERS 

The BUS DRIVER section drives the final stage 4 result to the output interface on an active-low 
precharged bus, F_B%F_0UTJL<B1:B55>. This bus is shared with stage 3 which uses it to bypass 
stage 4 for certain instructions. The input to the BUS DRIVER section is F_4%FD1R_H<B1:B55>. 
During PHt_3, if stage 4's data_valid bit is set and the underflow condition is not detected, the 
inverted value of F_4%FD1R_H<B1:B55> is driven onto the bus. If underflow is detected then 
the bus is not driven. This represents a zero being driven to the output interface. The fraction 
sign bit (SIR), the PSLJs bit, and the exponent data bits are all driven to the output interface in 
the same manner. 

11.22.2 Fraction Operation 

The operations performed in the fraction datapath are shown in the table below. 



Table 11-19: Fraction Datapath Operations 

ADDER 

Condition Floating Operation 

SHD7T_EN 

EFFJSUB AND FN=1 AND DeltaE=0 FDlR <- 0 + NOT FB + 1 Y 

EFP.SUB AND PN=0 AND DeltaE=0 FDlR <- 0 + FB Y 

EFF_ADD OR (EFF.SUB AND NOT DelatE=0) FDlR <- 0 + FB + Rndx Y 

MULf FDlR <- 0 + FB + Rndx Y 

DIVf FDlR <- 0 + FB + Rndx Y 

CVTif FDlR <- 0 + FB + Rndx Y 

CVTffTMOV FDlR <- 0 + FB + Rndx Y 

MNEG instruction FDlR <- 0 + FB N 

CVTfi AND S1R=0 FDlR <-0 + FB N 

CVTfi AND S1R=1 FDlR <-0 + NOT FB + 1 N 

CMP/TST and PIPELINED CMP inst FDlR <- 0 N 

MULL FDlR <- 0 + FB N 
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Table 11 


-20: 


Fraction Datapath Operation Summary 










Conditions 






Inputs 


Output 


OPC 


EDIFF 


E1Z 






FDlR 


MELSBR 


FDlR 




Value 














E_A 


X 




0 


X 


X 


y 


X 


SUM 


E_A 


X 




X 


0 


X 


V 


X 


SUM 


E_A 


=0 




1 


1 


X 


X 


X 


0 


E_S 


=0 




X 


X 


0 


V 


X 


SUM 


E_S 


=0 




X 


X 


1 


X 


X 


0 


E_S 


=0 




1 


1 


X 


X 


X 


o 


E_S 


>0 




0 


X 


X 


V 


X 


SUM 


E_S 


>0 




X 


0 


X 


V 


X 


SUM 


MULf 


X 




0 


0 


X 


V 


X 


SUM 


MULf 


X 




1 


X 


X 


X 


X 


0 


MULf 


X 




X 


1 


X 


X 


X 


0 


DIVf 


X 




X 


0 


X 


V 


X 


SUM 


DIVf 


X 




X 


1 


X 


X 


X 


0 


CVTif 


X 




X 


X 


0 


V 


X 


SUM 


CVTif 


X 




X 


X 


1 


X 


X 


0 


CVTff 


X 




0 


X 


X 


V 


X 


SUM 


CVTff 


X 




1 


X 


X 


X 


X 


0 


MOV/N 


X 




0 


X 


X 


V 


X 


SUM 


MOV/N 


X 




1 


x 


Y 


X 


X 


0 


CVTfi 


X 




0 


X 


X 


V 


X 


SUM 


CVTfi 


X 




1 


X 


X 


X 


X 


0 


MULL 


X 




X 


X 


X 


V 


V 


SUM 


CMP 


V 




V 


V 


V 


X 


X 


0 



E_A/E_S - Eff add/Eff substarct. 
MOV/N - MOV/MNEG instruction 
0 - Zero result. 

X - Don't care 
V - Valid 
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The "=0" and ">0" under the EDIFF value column for E_A or E_S refers to the exponent difference 
value being equal to zero or greater than zero respectively. 

11.23 EXPONENT DATAPATH 



Figure 11-29: Block Diagram of Exponent Processor 
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11.23.1 Exponent Block Description 



The exponent block can be used for various functions. In stage 4 it is used to increment the stage 
3 exponent result. It is also used to detect the floating underflow and floating overflow conditions 
on the final result. The zero detector result is used for CVTfi overflow detection logic. The final 
exponent result is either the stage 3 result, or the stage 3 result incremented by one, (if there 
is overflow) or zero. As the selection of the final result is done near the end of a cycle, floating 
overflow and underflow are computed for all possible results and the correct one is chosen with 
the result. 

11.23.2 Exponent Operation 



In the exponent datapath, the stage 3 exponent result is incremented unconditionally for each 
instruction. Then, depending on the instruction and the fraction result, the correct exponent 
is selected. The three possible exponent results are: the stage 3 exponent result, the stage 
3 exponent result incremented by one, and zero. For instructions having integer as the final 
output, the exponent is a don't care. 



11.23.3 Floating Overflow and Underflow Detection 

Floating point overflow and underflow is detected on the output of the exponent adder as well as 
the exponent data (ED1R). 

Floating point overflow requires detecting a case when the exponent is larger than the largest 
biased exponent of 255 for F and D, and 2047 for G. The overflow is detected as follows, where 
e<12:0> represents the exponent: 



For F and D: Overflow - e<12> * ( ©<11> + e<BITMAP> (10) + e<9> + e<8>) 



for G: overflow - e<12> * e<ll> 

The floating overflow signals, ED1R_0VF and E_ADl_OVF, are only asserted if an overflow is 
detected and the appropriate enable signal is asserted. The enable signals are en_fd_type_l and 
en_g_type_l, they signal whether a floating point operation is being performed and what the data 
type is. 

Floating point underflow requires detecting the case when the exponent is smaller than the 
mSn-i-mirm exponent. Since the smallest biased exponent is 1 for F, D and G, the following logic 
detects underflow: 

for F r D and G: 

underflow « e<12> + NOR (e<0> to ©<12>) , which reduces to r 
- e<12> + NOR (e<0> to e<ll» 
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As with overflow, the underflow signals, ED1R_UNF and E_AD 1_UNF, are asserted only if an 
underflow is detected and one of the enable signals is asserted. 

The overflow and underflow signals are selected as described in the output selector section. 

11.23.4 Output Selector 

The output selector is used to select the output data from three different sources: edlr, e_adl or 
zero. This selection is done for the exponent output data (ED1R), the floating overflow (F_OVFR) 
and the floating underflow (F_UNFR). The selection is based on the assertion of two control 
signals, OSEL1JZERO and SHFT_DONE. 

SHFTJDONE, if asserted, selects the output from E_AD1; for overflow SHFT.DONE selects 
E_AD1_UNF and E_ADl_OVF. If SHFT_DONE is deasserted, then the output is selected from 
ED1R; for overflow and underflow, SHFTJDONE deasserted selects EDIR^OVF and ED1R JJNF. 
This selection is done using a 2 to 1 selector. 

The selection of zero is done prior to the 2 to 1 selector described above. The selection for the 
exponent result is done as follows. If the final result is know to be zero then a zero result is 
selected. The PSL.Z bit (see below under miscellaneous logic) is asserted if the final result is 
zero, which asserts OSELl.ZERO. If OSELl_ZERO is asserted, then the inputs from E_AD1 and 
ED1R entering the 2 to 1 selector are both forced to zero. Then, since only one select line is used 
to control the selector, the zero value will be transferred to the output regardless of the assertion 
of SHFT.DONE. 

The output of the selector is latched during PHI_2 of every cycle and driven to the BUS DRIVER 
section. 

BUS DRIVERS 

The BUS DRIVER section drives the final stage 4 result to the output interface on an active-low 
precharged bus, F_B%E_OUT_L<10:0>. This bus is shared with stage 3 which uses it to bypass 
stage 4 for certain instructions. The input to the BUS DRIVER section is F_4_E%ED1R_H<10:0>. 
During PHI_3, if stage 4's data_valid bit is set and the underflow condition is not detected, the 
inverted value of F_4_E%ED1R_H<10:0> is driven onto the bus. If underflow is detected, the bus 
is not driven. This represents a zero being driven to the output interface. 
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Table 11 


-21: 


Exponent Datapath Operation Summary 








Conditions 






Input 


Output 


OPC 




E1Z 


E2Z 


FT 

F_Z 


EDlR 


EDlR 




Value 












E_A 


X 




0 


X 


X 


V 


V 


E A 


X 




X 


0 


X 


V 


v 


E_A 


=0 




1 


1 


X 


X 


0 


E_S 


=0 




X 


X 


0 


V 


V 


E_S 


=0 




X 


X 


1 


X 


0 


E S 


=0 




1 


1 


X 


X 


o 


E_S 


>0 




0 


X 


X 


V 


V 


E_S 


>0 




X 


0 


X 


V 


V 


MULf 


X 




0 


0 


X 


V 


V 


MULf 


X 




1 


X 


X 


X 


0 


MULf 


x 




X 


1 


X 


X 


o 


DIVf 


X 




X 


0 


X 


V 


V 


DIVf 


X 




X 


1 


X 


X 


o 


CVTif 


X 




X 


X 


0 


V 


V 


CVTif 


X 




X 


X 


1 


X 


0 


CVTff 


X 




0 


X 


X 


V 


v 


CVTff 


X 




1 


X 


X 


X 


0 


MOV/N 


X 




0 


Y 




V 


V 


MOWN 


X 




1 


X 


X 


X 


0 


CVTfi 


X 




X 


X 


X 


X 


X 


MULL 


X 




X 


X 


X 


X 


X 


CMP 


X 




X 


X 


X 


X 


0 



E_A/E_S - Eff add/Eff substarct 
M0V/K~ - MOV/MNEG instruction 
X • Don't care 
V - Valid 

The "=0" and ">0" under the EDIFF value column for E_A or E_S refers to the exponent difference 
value being equal to zero or greater than zero respectively. 
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11.24 Control 



Figure 11-30: Control Block Diagram 
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11.24.1 Control Block Description 

The control block supplies all the control signals for various operations in stage 4 and also sends 
the control information to interface delayed by a cycle. The control block gets it's input from stage 
3. 

11.24.2 Control Block Implementation 

The main control is implemented with a PLA. The inputs to the PLA are the opcode and bypass 
signals. All the instruction information is encoded in FOP_FLOWR_H. The following control 
information is decoded in the PLA: EFF_SUB, SHIFT_EN, MULL, CVTFI, RND, ENAJMST and 
PCMPR. SHIFTMEN is asserted for CVTif, CVTDF, ADD/SUB, DIVf, MULf. RND is asserted for 
CVTif, CVTff, ADD/SUB, DIVf, MULf. ENA_DET is asserted for CVTff, ADD/SUB, DIVf; MULf; 
CVTfi; CVTif. 

The destination data type is decoded to get six signals for each datatype. They are: 
FTYPE,DTYPE, GTYPE, BYTE, WORD and LONG. 
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The logic used to generate other control signals in stage 4 are as follows: 



rnd_en_h - f op_f lowr_h<2> * md_h * <eff_sut> AND e_diff_eql_0r) 

£op_f lowr_h<2> : If this signal is low for instruction in pipelined mode requiring rnd, 

then rnd is disabled, i.e. it is a truncate mode. 

mdf_h - rad_en_h * ftype 

mddi_h - md_en_h * dtype 

rndg_h « rnd_en_h * gtype 

cin_t>58_h • e_dif f_eqi_0r_h * eff_sub * f_nr_h 
cinb55_one_h - (cvtfi_h * slr_h) + (signs_not_eqlr_h) 

fb_neg_h is generated by stage 3 and sent as fb_neg4r_h to stage 4. The equation 
implemented in stage 3 is : 



fb_neg_h « (cin_b56_h + cinb55_one) * f_i%fbox_bypass_h 
sei_other_h - pcmpr OR pslz_f_h 

psir_f_h : This signal will be high if the result for a floating destination result 
is C and if for a CM? instruction if both the ooerands are same. 



11.25 MISCELLANEOUS AND SIGN LOGIC 



Figure 11-31: Miscellaneous Pla Block Diagram 
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MISC PLA 
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11.25.1 Miscellaneous Sign Logic Implementation 

Stage 4 is used to find the sign of the final result, condition codes and exceptions. Specifically it 
does sign computation, integer overflow detection, zero result detection, negative result detection, 
reserved operands and floating divide by zero detection by utilizing the information provided from 
the previous stages of the pipe. If the result is zero, stage 4 will force its output to zero. In the 
case of floating underflow, the sign, PSLN_F_H, fraction, and exponent of the result are forced to 
zero. 
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11.25.2 Sign and Negative Result Logic 

The sign of the final result and the PSL.N status bits are the same except for CMP and TST 
instructions. For CMP and TST instructions, the sign bit is a don't care and the PSL.N bit is 
high if the first operand is strictly less than the second operand. If the final result is a zero 
then the sign bit and the PSL.N bit should be forced to zero. The PSL.Z (see below) bit is set 
if the result is zero, which is used to force the sign and the PSL.N bit to zero. For the integer 
instruction the sign is already in the result, and hence sign is computed only for floating results. 
Hence the PSL.N bit for floating result PSLNJF_H is same as the sign bit. The signals, PSL.N 
and PSL.Z, are driven to the output interface on the active-low precharged bus which is shared 
with stage 3. During PHI_3, if stage 4's data_valid bit is set and the underflow condition is not 
detected, the inverted value of PSL.N is driven onto the bus. The inverted value of the PSL.Z 
signal is also driven onto the bus during PHI_3, if the data_valid bit is set, regardless of the 
underflow condition. The interface uses these signals to determine if the CMP condition is met 
or not. 

The PSL.N bit is obtained as below. 

If PEL. 2 ther. PSL.N - 0 

For EFFADD/EFFSUB the PSL.N bit of the result is given as follows. 



For MULf and DIVf the PSL.N bit of the result is the XOR of the sign of the input operands. 

PSL.N - signs_r.=t_ecl * (K3Lf + DIVf) 

For MOV, CVTff and CVTif the PSL .N bit of the result is the sign of the input operand. For 
MNEG instruction the PSL.N bit of the result is the inverse of the sign of the input operand. 

PSL.N - sir * (MOV + CVTff + CVTif) + sir * MNEG 
For CMP and TST instruction the PSL.N bit is 
PSL.N - [signs_not_eql*slr + 

signs_eql*{ e_diff_eql_0 * (f_n XOR sir) * ~Fz + 

e_diff_eql_0 * (sir XOR «_n) }) * (CMP + TST) 

All the above computations are done in the miscellaneous PLA. As the number of minterms for 
psln logic was large, two signals are generated in the PLA, which are OR'ed outside and AND'ed 
with PSLZ_F_H, to give the final PSLN_F_H. Sign has to be computed only for instructions 
considered above. For all the above instruction the final sign is either the PSL.N bit or it is a 
don't care, hence 

F_4%S1R_H - PSLN_F_H 

For CVTfi and MULL, the PSL.N bit is the MSB of the final result. For MULL and CVTfl 
(destination long), the MSB is SUM<B24>. For CVTfi with destination of word the MSB is 
SUM<B40> and with destination of byte the MSB is SUM<B48>. Also when the destination is 
byte and word, the only instruction possible is CVTfi. Hence the PSL.N bit is 

PSL.N - S0M<B24> * (LONG * CVTfi + MULL ) + 
SUM<B40> * WORD + SUM<B48> * BYTE 
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11.25.3 Integer Overflow 

Integer overflow is possible for MULL and CVTfi instructions. The overflow condition for the 
convert floating to integer instruction is determined in stages 1, 3 and 4 of the pipe. For MULL 
instruction, the overflow is determined in stage 4. All these conditions are combined to give the 
integer overflow signal to the interface stage. 

OVERFLOW DETECTION FOR CVTfi 

The CVTfi instruction overflow detection operation performed in all the stages is given below. All 
constants are in decimal. 

Let the exponent of input operand be El, then the actual exponent of the floating number is 
El-bias. Let that number be ACTUAL_EXP. 

hence, 

2 8 ; for F and D type 
Cli ; fcr G type 

Let DEST_LEN equal the length, in bits, of the destination result 
hence, 

For convert from floating to integers of length 8(B), 16(W), 32(LW) integer overflow occurs under 
the following condition. 

1. if actual_exp > dest_len 

2. if actual_«xp - cest_len and slr-0 

3. if actual_exp - dest_ier. ar.d sir— 1 and the integer portion 
is net equal to the most negative number 

4. for CVT rounded to long only, in addition to the above conditions the 
following conditions have to be checked: 

a) if actual_exp - 31 and sir - 0 and the 32 bits of the integer part 
are of the form 01111... Ill and the remaining fraction is greater than 
or equal to 0.5. 

b) if actual_exp - 32 and sir - 1 and the 32 bits of the integer part 
is of the form 10000... 000 and the remaining fraction is greater than 
or equal to 0.5. 

The actual detection of the above conditions are done in stages 1, 3 and 4. 
In stage 1 the following signals are generated. 

lnegir - Least negative integer; high if <B0:B31> of F_I%FD1R_H are 1 

mnegbr - Most negative byte; high if <B1 :B7> of F_I%FD1R_H are 0 

mnegwr - Most negative word; high if <B1:B15> of F_I%FD1R_H are 0 

mnegbr - Most negative longword; high if <B1:B31> of F_I%FD1R_H axe 0 

crfl_mdr - Convert floating to longword round bit; <B32> of F_I%FD1R_H 

e_dif f _eql_2 4r - exponent difference equals 24 

e_diff_eql_25r - exponent difference equals 25 

In stage 3 an exponent difference (see below) is done to determine the first three conditions for 
CVTfi overflow. The fourth condition for CVTRfL is also determined in stage 3. 
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Let El be the exponent of the incoming operand in stage 3 and ER be the result of the subtraction 
in stage 3. ER is send to stage 4. 

ER « (bias + dest_len) - El 
- constant - El 

Above Constant Values 

F,D — > B C-136 
F,D --> W C-144 
F,D — > L C-160 
G — > 3 C-1032 
G — > W C-1040 
G — > L O1056 

Stage 3 sends out two signals, IV3 and IV4, to stage 4 for CVTfi overflow detection. They are 
generated as follows. 



iv3 - sir ♦ ms::: for dest_ler. - 8 (f — >SVTS) 

- sir *• :nnervr for a«st_l«n ~ 16 (f — »r~?0) 

- sir ■* rr.erlr for ~est_le:: - 32 (f— >-?K3S:5?J)) 

iv4 - (Inegir ' sir ~ e_diff_eql_2 r 

■r rs-eolr ~ sir ' orfl_rr.ir~' e_diff_eql_24) * (CVTRfL) 

In stage 4 the following operations are performed. Let, 

el<12:0> « eKrcner.t result frcr. stace 3 surstraoticn 
e_r. « the sirr. rit of el i.e. el<12> 

elz - 1, el<12:0> is zero 

The first two conditions is determined as 

ovtfi_ovfi!2 - e_r. t elz * sir 

The third condition for CVTfi overflow is determined as: 

cvtfi_ovf!3 - elz * iv3 

The fourth condition for overflow is given by iv4. Finally the CVTfi overflow is determined as 

cvtfi_ovfl - (cvtfi_ovfll2 + cvtfi_ovfl3 + iv4) * ( CVTfi) 

OVERFLOW DETECTION FOR MULL 

For MULL integer overflow occurs if the high half of the double length result is not equal to the 
sign extension of the low half. The following condition is determined on MULL result to detect 
integer overflow. The register F_3%FD1R_H<B0....B32> contains the high 33 bits of the MULL 
64-bit result. 

niull_zero - NOR OF BITS fdlr(BO) THROUGH fdlr(B32) ;33 BITS 
mull_one - AND OF BITS fdlr(BO) THROUGH fdlr(B32) ;33 BITS 
mull_ovf - [mull_zero * mull_one ] * MULL 

The integer overflow is defined as: 

1_VR_H - cvtfi_ovfl + ( mull_ovf * f_iounfr_h) 
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11.25.4 Zero Result 

When the final result is zero then the zero flag (PSL.Z) bit has to be set. Different instructions 
are analyzed. 

For EFFADD/EFFSUB a zero result is possible when a) Both the input operands are equal and 
it is an effective SUB operation or b) Both the input operands are zero. 

PSL.Z - <eff_sub * f_z * e_diff_eql_0) + [ <el_z * &2_z) ] (ADD+SU3) 

For a floating multiply instruction, zero result is possible only when one or both the input operands 
is zero. 

PSL.Z - (el_z + «2_z) * KULf 

For a floating divide instruction, a zero result is possible only when the dividend is zero. i.e. the 
second operand is zero. When the first operand, the divisor, is zero then it is floating divide by 
zero. When a floating divide by zero occurs then the PSL.Z bit is a don't care. 



For MOV/MNEG/CVTfT instructions zero result is possible only when the input operand is zero. 



For CMP/TST instruction zero flag has to be set when operand 1 is equal to operand 2. For the 
TST instruction operand 2 is zero. 

PSL.Z - (signs_eql * e_diff_eql_0 * f_z)*(CK? + TST) 

For convert integer to floating instructions the result is zero if the input integer is zero. 

PSL.Z - f_z * CVTif 

All the above computation is done in the miscellaneous PLA. The output of the miscellaneous 
PLA is PSLZ_F_H, as only the PSL.Z bit for floating instruction was considered. 

For integer multiply instructions and all convert floating to integer instructions, zero result is 
possible for many different input operands. Hence the final result will be checked for zero result. 
For the CVTfi instruction, stage 4 is used to do a 2's complement. The 2's complement of zero 
is again zero, and the 2's compliment of any non-zero number will not be zero. Hence the zero 
condition can be detected at the input of stage 4 rather than at its output. For MULL the low 
order 32 bits of the result need to be checked for zero result. The register MILSBR has the 6 low 
bits of 32-bit lsbs and register FD1R<B32:B57> has the other 26 bits of the 32-bit lsb result. The 
conditions which are generated are as follows: 

- NOR of f_3%fdlr_h<B56:B57> * NOR of f_3j=milsbr_h<5:0> 

- NOR of f 3%fdlr h<B48:B55> 
« NOR of f~3%fdlr~h<B40:B47> 

- NOR of f~3%fdlr~h<B32:B39> 

- NOR of f 3%fdlr~h<B24:B31> 



f_4_d%zero_mil_h 
f_4_d%zero_byt_h 
f_4_d%zero_wor_h 
f_4_d%zero_mul_h 
f 4 d%zero Ion h 
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The zero detection is done as follows, 

zero_b - NOR OF FD1R{B48) THROUGH FD1R{B55) 
- f_4_d%zero_byt_h 



zero 1 



PSL. 



;8 bits 



NOR OF FD1R(B40) THROUGH FD1R(B55) ;16 bits 
f_4_d%zero_byt_h * f_4_d%zero_wor_h 

NOR OF FD1R(B24) THROUGH FD1R(B55) ;32 bits 
zero w * f 4 d%zero mul h * f 4 d%zero Ion h 



zero_mull - (NOR OF FD1R(B32) THROUGH FD1R(B57)) * 
(NOR OF MILSBR(O) THROUGH MILSBR(5) ) 
■ zero w * f 4 d%zero mul h * f 4 d%zero mil h 



zero_l * (long * CVTfi) 
zero b * bvte + zero 1 ' 



t- zero_w * word + 
zero mull * MULL 



During PHI_3, if stage 4's data_vaUd bit is set, the inverted value of the PSL.Z bit is driven onto the 
active-low shared bus. 



11.25.5 Reserved Operand 

The reserved operand fault is checked in stage 4 of the pipe. A reserved operand fault is possible 
only when the input operand is floating type. When a reserved operand fault occurs the other 
condition codes are overridden. The reserved operand detection is done in the miscellaneous pla. 

For one operand instruction: 

RZS.OPD - (f_3_c%elzr_h * f_3%slr_h)*(MOV + MNEG + CVTfff + CVTfi + TST) 

For two operand instruction: 

RES.OFD - (f_3_c%elzr_h * f_3%slr_h + £_3_c%e2zr_h * ;_3%s2r_h) * 
(ADD+ SUB -r DlVf + KULf + CMP) ~ ~ 



11.25.6 Floating Divide by Zero 

When a floating divide by zero occurs, the f_<fcv_by_zero bit has to be set. The floating divide by 
zero fault occurs if operand 1 is zero. The logic is done in miscellaneous PLA. 



f_div_by_zero - [f_3_c%elzr_n * £_3%slr_h] * (DXVf) 
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11.26 FBOX TESTABILITY 

This section describes FBOXJIest mode of operation. FBOXJIest mode would primarily be used 
during chip debug and possibly during manufacturing tests. 

11.26.1 FBOX_Test Control Signals 

Two FBOX input signals are associated with FBOXJIest mode. E%FBOX_TEST_ENB_H 
is received from the EBOX, latched during PHI1, and driven down the FBOX pipe as 
F_I%FBOX_BYPASS_H. Assertion of E%FBOX_TEST_ENB_H puts the FBOX into FBOXJIest mode. 
A second signal, E%FBOX_S4_BYPASS_ENB_H, has the function of selecting two slightly different 
modes of FBOXJIest mode. E%FBOX_S4_BYPASS_ENB_H is received from the EBOX by a PHI1 
latch and driven into the Fbox core as F_I%S4_BYPASS_ENB_H by a following PHI 3 latch . 

11.26.2 FBOXJTest Mode Description 

FBOX_Test mode allows simple testing of the FBOX fraction and exponent datapaths. When 
in FBOXJTest mode, the basic operation of each stage is to pass fraction and exponent data, 
unchanged, from its input to its output. Thus, the test mode features allow FD1R or FD2R to 
be passed through the fraction datapath and ED1R to be passed through the exponent datapath. 
Selection of whether to pass FD1R or FD2R to the Fbox output is done, in Stage3, by looking at 
the value of F_I%S4_BYPASS_ENB_H. SIGN bit processing is not affected by FBOXJIest mode. 



11.26.2.1 FBOX Section Operation During FBOXJTest Mode 

Input and Output — The Input and Output sections of the FBOX operate as normal. 

Divider — In the Divider, F_I%FBOX_BYPASS_H assertion forces F_D_C%DIVDONE_DAT_H to be 
asserted to Stagel effectively bypassing the Divider. This enables Stagel to use data supplied by 
the Input interface as the result of the Divider stage. 

Stagel — In Stagel, F_I%FBOX_BYPASS_H assertion forces Stagel output register select signals 
to a state that writes the Stagel FD1R, FD2R, and ED1R output registers with the contents of 
the Input interface FD1R, FD2R, and ED1R respectively. 

Stage2 — In Stage2, F_I%FBOX_BYPASS_H assertion forces right-shifter control to a "shift_of_zero" 
in order to pass FD1R through Stage2. Output register select signals are forced to a state 
which writes the Stage2 FD1R and ED1R output registers with the contents of the Stagel FD1R 
and ED1R. Stage2 FD2R is always written with the contents of Stagel FD2R irrespective of 
FBOXJIest mode. 

Stage3 — In Stage3, F_I%FBOX_BYPASSJB assertion forces left-shifter control to a "shift._of_zero" 
in order to pass FD1R or FD2R through Stage3. Selection of whether to pass FD1R or FD2R is 
done by the value on F_I%S4_BYPASS_ENB_H and output is on Stage3's FD1R. Stage3 ED1R output 
is written with Stage2 ED1R input while in FBOX_test mode. Stage3 fraction output selectors 
are forced to output the contents of the left_shifter during FBOXJIest mode. The following 
table describes Stage3 operation modes and data driven on various busses for different modes of 
operation. 
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The main features of this implementation are: 

o Either FD1R or FD2R can be selected to pass directly through the FBOX 

o The two shared busses between Stages 3/4 and the output interface can 
be selectively driven by Stage 3 or Stage 4. 

o Provides visibility of the Stage3 miniround incrementer results. 
F I%FB0X BYPASS H 



F_I%S4_BYPASS_£NB_H 

I Value Appearing On Busses 



1 1 

w 


Stage 3 Operation 
Mode 


Miniround 
Incrementer 
Input 


F B%F OUT KB1:B55> 


F 3%FD1R H<A0:B58> 


F B%E OUT L<10:0> 


F 3%EE 


I 00 ! 

i 1 


Noras! Operation w/ 
S 4_3 vpass » OFF 


1 Opcode 
Dependent 


1 Stage 4 fraction 
1 result 


I Stage 3 fraction 1 
I result I 


Stage 4 exponent 
result 


1 Stage 
re 


! 01 1 


Normal Operation w/ 
S4_3ypass » OK 


Opcode 
Dependent 


l Stage 3 fraction 
1 result if Stage 4 
I bypassed, else 


[Stage 3 fraction i 
I result i 

1 1 
i 1 


Stage 3 exponent 
result if Stage 4 
bypassed, else 
Stage 4 espesent 


i St age 

1 re. 


i 10 ! 


S4 Bypass " OFF 


Opcode 
Dependent 


1 Stage 4 Driver. 
! FD1R_H<B1:5E5> 


iFDlR_H<A0:B58> 1 


Stage 4 Driven 
ED1R_K<1C:C> 




1 1 


S4_Sytass - 0", 
ryp as sable opcode 


Dependent 
" see 
footnote 


! FDrR_K<51:b55> 

i ~ ** 

! 


! ~ i 

1 - ! 

1 i 


Fd£r_H<10:0> 




I 1 

1 I 
1 1 


S4_Bypass - ON, 
ncn-bypassable opcode 


Dependent 
* 


1 FD2R H<B1:B55> 

! 

1 * 


i FD2R :-:<A." :55E> 

1 1 

! 1 

1 * 1 


SD1R_H<10:0> 






All fraction data bits 


are passed 


through Stage 3, as 


received, by way of 


the left shifter. 





** - In FBOX_Test mode, with S4_Bypass on and a bypassable opcode in Stage3 the majority but not all of frac 
bits are passed through Stage 3, as received, by way of the left shifter and the output selector choosi 
shifter output. 

For F-type data two fraction bits (B22:B23) are passed through Stage3 by way of the miniround incremen 
Similarly, for D-type data six fraction bits (B50:B55) and for G-type data three fraction bits (B50:B5 
are passed through Stages miniround incrementers . 

It is important to note that the control logic for the miniround input selectors makes it's selection 
on opcode information and the signal F_3_A%5HFT_DONE_H . FBOX_Test mode is not factored into the minir 
incrementer' s input selector control. Depending on the opcode and exponent difference, miniround inpu 
could choose left shifter output or fraction adder output to be fed to the miniround incrementers. 

The simplest way to pass FD2R through Stage3 (unchanged) is to select the proper opcode 
and data such that an effective subtract with exponent difference of zero will enter 5tage3. 
This will select Stage 3's left shifter output as the source for the miniround incrementer Input 
and the round bit position will be zero. 

Stage4 — In Stage4, F_I%FBOSLBYPASS_H assertion forces fraction adder carry-in and round 
signals to zero to allow FD1R to pass through Stage3 unchanged. Stage4 FD1R and ED1R 
are written with the contents of Stage3 FD1R and ED1R respectively. 
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11.26.3 Revision History 

Table 11-22: Revision History 

Who "When Description of change 

Anil Jain 17-Mar-1989 Initial Release 

Anil Jain 18-Dec-1989 Updated to reflect the Fbox implementation 

Dave Deverell 25-Jan-1991 Updated to reflect PASS1 implementation and FOXJTest section 

added 
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Chapter 12 
The Mbox 



12.1 INTRODUCTION 

The Mbox performs three primary functions: 

• VAX memory management: The Mbox, in conjunction with the operating system memory 
management software, is responsible for the allocation and use of physical memory. The 
Mbox performs the hardware functions necessary to implement VAX memory management. 
It performs translations of virtual addresses to physical addresses, access violation checks 
on all memory references, and initiates the invocation of software memory management code 
when necessary. 

• Reference processing: Due to the macropipeline structure of NVAX, and the coupling between 
NVAX and its memory subsystem, the Mbox can receive memory references from the Ibox, 
Ebox and Cbox simultaneously. Thus, the Mbox is responsible for prioritizing, sequencing, 
and processing all references in an efficient and logically correct fashion and for transferring 
references and their corresponding data to/from the Ibox, Ebox, Pcache, and Cbox. 

• Primary Cache Control: The Mbox maintains an 8KB physical address cache of I-stream and 
D-stream data. This cache, called the Pcache (Primary Cache), exists in order to provide a two 
cycle pipeline latency for most I-stream and D-stream data requests. It is the fastest D-stream 
storage medium for NVAX and represents the first level of D-stream memory hierarchy and 
the second level of I-stream memory hierarchy for the NVAX computer system. The Mbox is 
responsible for controlling Pcache operation. 
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1 2.2 MBOX STRUCTURE 

This section presents a block diagram of the Mbox and defines the function of the basic Mbox 
components. This section neither explains why the functions of each component exist nor does it 
discuss the interactions among the components. The intent of this section is only to define the 
function and interconnection of the components for future discussion. Subsequent sections will 
deal component interaction. 

The following block diagram illustrates the basic components of the Mbox. 
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Figure 12-1 : Mbox Block Diagram 
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The Mbox is implemented as a two-stage pipeline located in the fifth and sixth segments of the 
NVAX macropipeline (S5 and S6). References processed by the Mbox are first executed in S5. 
Upon successful completion in S5, the reference is transferred into S6. At this point, the reference 
has either completed or is transferred to the Ibox, Ebox, or Cbox. 

During any cycle, the fundamental state of the S5 and S6 stages can be denned by the particular 
references which currently reside in these two stages. For the purposes of describing the Mbox, 
all references can be viewed as a packet of information which is transferred on the S5 and S6 
buses. The So reference packet, and the corresponding S5 buses are denned as: 

• ADDRESS: The M_QUE%S5_VAJH<31:0> bus transfers all virtual addresses and some physical 
addresses into the S5 pipe. The M_QUE%S5_PA_H<31:0> bus transfers some physical addresses 
into the S5 pipe and transfers all addresses out of the S5 pipe. 

• DATA M_QUE%S5_DAEAJB<31:0> transfers data originating from the Ebox, through the S5 
pipe. 

• COMMAND: M_QUE%S5_CMD_H<4:0> transfers the type of reference through the S5 pipe. This 
command field is defined in Section 12.3.1. 

• TAG: The M_QUE%S5JTAG_H<4:0> transfers the Ebox register file destination address corre- 
sponding to the reference through the S5 pipe. 

• DEST_BOX: M_QUE9bS5_DEST_H < 1 : 0> transfers the reference destination information through 
the So pipe. This field is defined as follows: 



M_QUE<?cS5_DEST_ 

H Definition 

00: the reference requests data destined for the Mbox 

01: the reference requests data destined for the Ibox. 

10: the reference requests data destined for the Ebox. 

11: the reference requests data destined for the Ebox and Ibox 



• AT: The M_QUE%S5_AT_H<1:0> transfers the access type of the reference. This field is defined 
as follows: 



M_QUE%S5_AT_ 

H Definition 

00: tb passive query access (See PROBE command) 

01: read access 

10: write access 

11: modify access (read with write check for future write to same addr) 

* DL: The M_QUE%S5_DL_H<1:0> transfers the data length of the reference. This field is defined 
as follows: 
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M_QUE%S5_DL_ 

H Definition 



00 
01 
10 

11 



byte 
word 
longword 
quadword 



• REF_QUAL: The M_QUE%S5_QUAL_H<6:0> transfers information which further qualifies the 
reference for the purpose of Mbox processing. This field is defined as follows: 

M_QUE%S5_QUAL_H bit Definition 

M_Qt3E%s5_QUALuH<6> address of reference is currently a virtual address. 

m_que^s5_qual_h<5 > reference has been tested for cross-page condition. 

m_que^S5_qual_h<4 > reference is first part of an unaligned reference. 

m w que%ss_quai._h<3> reference is second part of an unaligned reference. 

m_qce%s5_qual u h<2> enable ACV and M=0 checks. 

m_qot%s5_qual.h<1> reference has or is forced to have a hard error. 

m_qce«*s5_quai^h<0> reference has or is forced to have a memory management fault (ACV/TNV/MsO). 

The S6 reference packet, and the corresponding S6 buses are defined as: 

• ADDRESS: The M%S6_PA_H<31:0> bus transfers a physical address through the S6 pipe. 

• DATA: B%S6_DAIA W H<63:0> transfers data through the S6 pipe. 

• COMMAND: M9bS€_CMD_H<4:0> transfers the type of reference through the S6 pipe. This 
command field is defined in Section 12.3.1. 

• DEST_BOX: M_QUE_MS2%S6_DEST_H<1:0> transfers the reference destination information 
through the S6 pipe. This field is defined as follows: 

M_QUE_MS2%S6_ 

DEST H Definition 



00 
01 
10 

11 



the reference requests data destined for the Mbox. 

the reference requests data destined for the Ibox. 

the reference requests data destined for the Ebox. 

the reference requests data destined for the Ebox and Ibox. 



S6_BYTE_MASK: M%S6_BYTE_MASBLH<7:0> transfers the byte mask information through the 
S6 pipe. The byte mask field is used to indicate which bytes of a longword or quadword write 
should actually be written to a cache or memory. 

REF_QUAL: M_QUEJMS2%S6_QUAL_H<3 :0> transfers information which further qualifies the 
reference for the purpose of Mbox processing. This field is defined as follows: 



DIGITAL CONFIDENTIAL 



The Mbox 12-5 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



M_QUE_MS2%S6_QUAL_ 
Hbit 



Definition 



M_QUE_MS2%S6_QUAL_H<3 > 
M_QOT_MS2%S6_QUALJB<2> 
M_QUE_MS2%S6_QUAL_H< 1 > 
M_QUE_MS2%S6_QUAL_H<0> 



reference is first part of an unaligned reference, 
reference is second part of an unaligned reference, 
reference has or is forced to have a hard error. 

reference has or is forced to have a memory management fault ( ACV /TNV/M=( 



12.2.1 IREF_LATCH 

The IREF_LATCH is a latch which stores all I-stream read references (IREADs) requested by 
the Ibox. Each IREAD is stored in the IREFJLATCH until the reference successfully completes 
in S5. 

The following figure illustrates the structure of the IREF_LATCH: 
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Figure 12-2: I ret Latch 
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S5_VA<31:0> 



OWJNCREMENTOR 



IBOX.DSST 



S£_DES7<1 :0> 



READ ACCESS 



S5 AT<1.0> 



S5_DL<1 :0> 



SS QUAL<6> 



S5_OUAL<5> 



FALSE 



S5_QUAL<4> 



S5 OUAL<3> 



TRUE 



S5_QUAl<2> 



MBOX_PORCE_HARD_FAULT 




S5_QUAL<1 > 









M BOX.FORCE.MM E.FAULT 




S5_QUAL<0> 









The output of the address field of the IREF_.LA.TCH has an incrementer associated with it in 
order to increment the quadword address. The output of this structure can be tristated. 

See Section 12.3.5.2 for a more complete understanding of IREF_LATCH function in the context 
of overall Mbox operation. 
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12.2.2 SPEC_QUEUE 

The SPECLQUEUE is a 2-entry FIFO structure which stores D-stream read and write references 
associated with specifier source and destination operands decoded by the Ibox. Each reference 
latched in the SPEC_QUEUE is stored until the reference successfully completes in S5. If the 
reference is unaligned, the entire reference must complete in S5 before the corresponding entry 
is invalidated. 

The following figure illustrates the structure of the SPECLQUEUE: 



Figure 12-3: Spec Queue 



IEOX SMD«4.1:0> 



!EOX_ADDfi<31:0> 



IBOX TAG<2:0> 



ISOX_«» DEST<1:0> 



IBOX_AT<1:0> 



IBOX DL<1:0> 



NOT STOP SPEC_O<0> 



VALID BIT 



DESTINATION 



ACCESS TYPE ACCESS TYPE 



DATA LENGTH 



DESTINATION 



DATA LENGTH 



SE CMD<4:0> 



SS VA<31:0> 



SS TAG«4:C> 



SS_DEST<1;0> 



SE DL<1:0> 



SE OUAL<6> 



XPAGE_CHECKED 



SS OUAL<6> 



S6 OUAL<4> 



SE OUAL<»> 



SS OUAL<2> 



FORCE_HARD_FAULT<0> 






SB_OUAL<1> 


MBOX_FORCE_HARD_FAULT<0> 







FORCE_MME_FAULT<0> 






SE_OUAL<0> 


MBOX_FORCE_MME_FAULT<0> 







The output of this structure can be tristated. 
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12.2.3 EMJ-ATCH 

The EM_LATCH latches and stores all commands originating from the Ebox. Each reference is 
stored until the following two conditions are satisfied: 1) the "complete logical reference" (i.e. 
the pair of aligned references required if the EMJLATCH reference is unaligned) clear memory 
management access checks, and 2) the EM_LATCH reference successfully completes in S5. 

The following figure illustrates the structure of the EM_LATCH: 
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Figure 12-4: EM_LATCH 



EBOX CMD<4:0> 



VA_BU6<31:0> 



W_BUS«31:0> 



EBOX TA&«4:0> 



EBOX. AT<1:0> 



EBOX DL<1:0> 



SBOX V!BT ADPR 



ACCESS TYPE 



DATA LENGTH 



4-WAY 
BYTE 
ROTATOR 



S5 CMD<4:0> 



SS VA<31:0> 



SS_DATA<31 :0> 



S5 TAG<4:0j 



SS.DEST<1 :0> 



Si AT<1:C-> 



SS_DL<1 :C> 



S5 OUAL<*> 



XPAGE_CHECKED 


XPAOE COND 


Si_OUAL<6> 







S5_OUAL«4> 



SS_QUAL<3> 



NO_MME_CHECK 




S5_OUAL<2> 









MBOX_FO«CE_HARD_FAULT 




8S_OUAL<1> 









MBOX_FORCE_MME_ FAULT 




SS_OUAL<0> 









FLUSH_PA_OUEUE 




FLU8H_PAO 









A 4-way byte barrel shifter is connected to the data portion of the EMJLATCH. This enables the 
write data to be byte-rotated into longword alignment. 



The EM_LATCH output can be tristated. 
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12.2.4 VAP_LATCH 

The function of the VAPJLATCH is to create and store the second reference of an unaligned 
reference pair. Each reference is stored until the reference successfully completes in S5. The 
following figure illustrates the structure of the VAP_LATCH : 

Figure 12-5: VAPJLATCH 



Sf DATA<31:0> 



S5_DEST<1 :0> 



SS *7<1:D> 



SE Dl<1:0> 



SS_CMD<4:0> 


COMMAND 


SS_CMD<4:0> 







SS VA<31:0> 


OU AD WORD 
INCREMENTOR 








ADDRESS 


SB_VA«31 :0> 













DESTINATION 



ACCESS TYPE 



DATA LENGTH 



SS DATA<31:0> 



SS TAQc4:C> 


TAQ 


SS_TAfi<*:0> 







SS_DEST<1:0> 



SS_AT«1:0> 



Sfi_DL<1:0» 



SS_OUAL<6> 


VIRT/PHYS 


S£_OUAL<6> 







S£ OUAL<6> 



SE_OUAL<4> 



SE_OUAl.<»> 



SS_OUAL<2> 


TRUE 


S£_OUAL«2> 







SS_OUAL<1> 



S6_OUAL<t>> 



The VAPJLATCH transforms the current S5 reference into a new reference. Thus, input for 
the VAP_LATCH is taken off of the S5 buses. An incrementor exists on the input side of the 
address field which adds eight to M_QUE%S5_VA w H<3iiO> in order to create the second reference in 
an unaligned pair of references. 



The VAPJLATCH output can be tristated. 
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See Section 12.3.17 for a more complete understanding of VAP_LATCH function in the context of 
overall Mbox operation. 

12.2.5 MMEJ.ATCH 

The MME_LATCH (Memory Management Exception Latch) stores references associated with 
memory management processing. It acts as a buffer between the S5 processing pipe and the 
MME_D ATAPATH . The MME_LATCH is the S5 source for PTE references (page table entry 
reads), PTE data, and Mbox internal processor registers and TB nil operations. 

The following figure illustrates the structure of the MMEJLATCH: 
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Figure 12-6: MMELATCH 



MME_CMD_GEN<4:0> 


COMMAND 


S5_CMD«4:0> 







MME ALU<31:0> 



65 VA<31:0> 



MD_BUS<31:0> 
MME ALU<31:0> 



SS DATA<31:0> 



MME TAS«4:0> 



SS TASc4:0> 



MME DEST<1:0> 



DESTINATION 



S£_DEST<1:0> 



MME AT<1:0> 



ACCESS TYPE 



MME DL<1:0> 



DATA LSNSTH 



S£_DU<1 :C> 



MME Vlfi- ADDR 



VIRTPHYS 



SS OUAl<£> 



S6_OUALc6> 



S6_OUAL<4> 



S6_OUAL<3> 



MME_ENABLE_ACV CHK 



S5_OUAL<2> 



SC QUAL<1> 



SS OUAL<0> 



Each reference is stored until the reference successfully completes in S5. 
The MME_LATCH output can be tristated. 
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12.2.6 RTY_DM!SS_LATCH 

The RTY_DMISS_LATCH stores D-stream reads which missed in the Pcache when a previous 
D-stream nil sequence has not yet completed. This latch is the mechanism by which a D-stream 
read, which missed in the S6 pipe during another D-stream fill sequence, can be retried in the 
S5 pipe at some later point. 

An S6 D-stream read is loaded into the RTY_DMISS_LATCH when it misses in the Pcache while 
a previous D-stream fill sequence is in progress. A RTYJDMISSJLATCH is driven into the S5 
pipe during or after the point when the final D_CF reference is executing in S6 to complete the 
previous fill sequence. A RTYJDMISS_LATCH reference is invalidated when its read is retired 
from S5. 

The following figure illustrates the structure of the RTY_DMISS_LATCH: 
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RTY_DMISS_LATCH 



VALID BIT 



S6_CMD<1:0> 


COMMAND 


SS CMD<4:0> 







S6 PA<31:0> 


ADDRESS 


S5 PA<31:0> 







S6 TAG<4:0> 


TAG 


S5 TAG<4:0> 







S6 DEST<1:0> 


DESTINATION 


SS DEST<1:0> 







S6 DL<V.0> 



DATA LENGTH 



SS DL<1:0> 



SS OUAL<6> 



TRUE 



SS Ql!Al<S> 



S6 QUAL<4> 




SS_QUAL<4> 









S6 QUAL<3> 




S5_QUALc3> _ 









S5 QUAL<2> 



FALSE 



SS_QUAL<1> 



FALSE 



S5_QUAL<0> 



The RTY_DMISS_LATCH output can be tristated. 

See Section 12.3.5.3.1 for a more complete understanding of RTY_DMISS_LATCH function in the 
context of overall Mbox operation. 
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12.2.7 CBOXJ.ATCH 

The CB 0X_LATCH stores references originating from the Cbox. These references are I-stream 
Pcache fills, D-stream Pcache fills, or Pcache hexaword invalidates. 

Each reference is stored until the reference successfully completes in S5. 

The following figure illustrates the structure of the CB OX_LATCH: 

Figure 12-S: CBOXJ-ATCH 



CBOX R£Q<0> 



VALID BIT 



CBOX_CMD<1:0> 


COMMAND 


S5_CMD<4:0> 







CBOX ADDR<3i:5> 



MBOX FILL QW<<:3> 



ADDRESS 



S5 PA<31:0> 



DESTINATION 



SS DEST<1:0: 



QJADWORD DL 



S5 DL<i:0> 



FALSE 



S5 QUAL<6> 



TRUE 



S5 QUAL<5> 



FALSE 



S5 QUAL<4> 



FALSE 



S5 QUAL<3> 



FALSE 



S5 QUAL<2> 



HARD ERROR<0> 


HARD ERR 


S5_QUAL<1> 









FALSE 



S5 QUAL<0> 



Note that no data field is present in this latch even though this latch services cache fill commands. 
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Cache fill data will be supplied to the Pcache on the B%S6JDAIA_H Bus by the Cbox during the 
appropriate S6 cache fill cycle. The C%CBOX_ADDR_H bus is driven by the Cbox during invalidate 
commands. During cache fill commands, all but two bits of the C%CBOXJU)DR_H bus are driven by 
the DMISSJLATCH or IMISSJLATCH. The Cbox will drive C%MBOX_FILL_QW_H<4^> during cache 
fill commands in order to supply the quadword alignment of the fill data within the hexaword 
block. 

The CBOX_LATCH output can be tristated. 



12.2.8 PA_QUEUE 

The PA_QUEUE (Physical Address Queue) stores the physical addresses associated with desti- 
nation specifier references made by the Ibox via a DEST_ADDR or KEAD__MODIFY command. 
The Ebox will supply the corresponding data at some later time via a STORE command. When 
the STORE data is supplied, the PA.QUEUE address is matched with the STORE data and the 
reference is turned into a physical WRITE operation. 

The following figure illustrates the structure of the PA_QUEUE: 



Figure 12-9: PA_QUEUE 



S ENTRIES DEEP 





VALID BIT 




VALID BIT 


FAjOUEUE.CONFLICT 


SS_PA<31 :0> 






I 




ADDRESS 




ADDRESS 


SS PA<31:0> 


SE DL<1:0> 












DATA LENGTH 




DATA LENGTH 


SS_DL<1:0> 




SS_OUAL<6> 
SS_OUAL<6> 
S5_OUAL<4> 








FALSE 
TRUE 


_ SS_OUAL<4> 


















S6_OUAl<3> 








SS_QUAL<3> 


MBOX_FORCE_HARD_FAULT<0> 


SE_QUAL<2> 
SS_OUAL<1> 






FALSE 








MBOX_FORCE_MME_FAU IT<0> 


















8S_OUAL<0> 







The PA_QUEUE is organized as a 8-entry FIFO. Addresses from the Ibox are expected in the 
same order as the corresponding data from the Ebox. 
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The PA_QUEUE has address comparators built into all FIFO entries. These comparators detect 
when the physical address bits <8:3> of a valid PA_QUEUE entry matches the corresponding 
physical address of an Ibox D-stream read. 

See Section 12.3.6.1 and Section 12.3.18.1.1 for a more complete understanding of PA_QUEUE 
function in the context of overall Mbox operation. 

12.2.9 TB 

The TB (translation buffer) is the mechanism by which the Mbox performs quick virtual-to- 
physical address translations. It is a 96-entry fully associative cache of PTEs (Page Table Entries). 
Bits 31 through 9 of all S5 virtual addresses act as the TB tag. The replacement algorithm 
implemented is Not-Last-Used. 

See Section 12.5.1.3 for more information. 

12.2.10 MME_DATAPATH 

The MME_DATAPATH (Memory Management Datapath) is used to process most memory man- 
agement functions performed by the Mbox. Specifically, it performs the following functions: 

• Creates read references of PTEs in order to obtain virtual address translations not currently 
cached in the TB. 

• Creates TB fill references in order to write PTE data into the TB. 

• Stores memory management internal processor registers. 

• Stores virtual addresses associated with memory management faults or TB parity errors. 

The MME_DATAPATH implements these functions with a register file and an ALU. See Section 12.5.1 
for a more complete description of the MME_DATAPATH. 

1 2.2.1 1 ARBITRATION LOGIC 

The AEBITRATION LOGIC is responsible for determining which reference source drives its 
reference packet into the S5 pipe. (See Section 12.3.4 for more information about reference 
arbitration.) 

12.2.12 S6JPIPELATCH 

The S6_PIPELATCH is the buffer between the S5 and S6 stages of the Mbox pipeline. It latches 
the S5 reference packet, modifies it appropriately, and drives it as an S6 reference packet into 
the S6 pipe. M_QUE%S5_DAIA^H<31:0> is driven onto both the upper and lower halves of B%S6_ 
DAEA_H<63:0>. MtfS6_CMD_H<4:0> is either: 

1. driven by the M_QUE%S5_CMD_H<4:0> 

2. is changed into a NOP 
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1 2.2.1 3 DMISS_LATCH and IMISSJLATCH 

The DMISS_LATCH stores the currently outstanding D-stream read. That is, a D-stream read, 
which missed in the Pcache, is stored in the DMISS_LATCH until the corrsponding Pcache block 
fill operation completes. The DMISSJLATCH also stores IPRJRDs to be processed by the Cbox 
until the Cbox supplies the data. I-stream reads are handled analogously by the IMISS._LA.TCH 
except that IPR_RDs are never handled by the IMISS_LATCH. 

The following figure illustrates the structure of the DMISSJLATCH and the IMISS_LATCH: 
Figure 12-10: DMISS_LATCH and IMISSJ-ATCH 



S5 BA<31:5> 



S6 PA<3l:0 s 



S6 7AG<4:0: 



S6 DEST<1:0> 



S6 QUAL<3> 



S6 QUAL<2> 



REQ DQW<0a 



VALiD BIT 



ADDRESS 



TAG 



DESTINATION 



1ST UNALIGNED 



2ND UNALIGNED 



DATA_RETURNED 



NON-CACHEABLE 



1ST FILL 



PCACH E_B LK_MATC H 

HEXAWORD ADDR MATCH 



CBOX ADDR<31:0> 



MISS_LAT TAG<4:0> 



MISS LAT D = ST<1:0> 



MISS LAT QUAL<3> 



MISS LAT QUAL<2> 



DO NOT CACHE REF 



FIRST FILL 



These two latches have comparators built in in order to detect the following conditions: 

* If the hexaword address of an invalidate matches the hexaword address stored in either 
MISS_LATCH, the corresponding MISSJLAXCH sets a bit to indicate that the corresponding 
fill operation is no longer cacheable in the Pcache. 
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• Address<ll:5> addresses a particular Pcache index (corresponding to two Pcache blocks). If 
addres9<8:5> of the DMISS_LATCH matches the corresponding bits of the physical address 
of an S5 1-stream read, the S5 1-stream read is stalled until the entire D-stream fill operation 
completes. This prevents the possibility of causing a D-stream nil sequence to a given Pcache 
block from simultaneously happening with an I-stream nil sequence to the same Pcache block. 

• By the same argument, address<8:5> of the EMISS_LATCH is compared against S5 D-stream 
reads to prevent another simultaneous I-stream/D-stream nil sequence to the same Pcache 
block. 

• Address<8:5> of both miss_latches is compared against any S5 memory write operation. This 
is necessary to prevent the write from interfering with the cache fill sequence. 

See Section 12.3.5.1 for a more complete understanding of the DMISS_LATCH/IMISS_LATCH 
functions in the contest of overall Mbox operation. 

1 2.2.1 4 MD_BUS_ROTATOR 

The function of the MD_BUS_ROTATOR is to right-justify read data and drive it on the M<«MD_ 
BUSJE. For unaligned reads (see Section 12.3.17.1) the MDJBUS_ROTATOR is designed to as- 
semble read data from two read references and drive it on the M<£MD_BUS_H in right-justified 
form. This rotator coupled with the Mbox decomposition of unaligned references into two aligned 
references, allows the Ibox and Ebox to issue unaligned D-stream reads and receive the requested 
data aligned to the Ebox datapath. 

The MD_BUS_ROTATOR is illustrated below: 
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MD BUS ROTATOR 
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:16> 
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:56> 
DATA< 
B%S6 



55 



48> 
DATA< 
B%S6 



47 



:40> 
t>ATA<! 
B%S6 



:32> 
DATA<3 
B%S6 



1 :24> 
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B%S6 



23:1 6> 
DATA<15:8> 
B%S6_DATA<7:0> 



M%S6 PA<2:0> 

TJl QUE%S6 QUAL<3:2> 



Although the diagram above describes the MD_BUS_ROTATOR as an 8-way byte barrel shifter, 
its actual design is a functional subset of a full barrel shifter. The lower four bytes of the output 
of the rotator are designed as a full 8-way byte barrel shifter in order to right-justify D-stream 
longword data. However, the upper four bytes always directly pass M%MD_BUS_H<63^2> since 
these bytes are only used when aligned I-stream quadword data is sent to the VIC. 

12.2.15 Pcache 

The Pcache is a two-way set associative, read allocate, no-write allocate, write through, physical 
address cache of I-stream and D-stream data. It stores 8192 bytes (8K) of data and 256 tags 
corresponding to 256 hexaword blocks (1 hexaword = 32 bytes). Each tag is 20 bits wide corre- 
sponding to bits <31:12> of the physical address. There are four quadword subblocks per block 
with a valid bit associated with each sunblock. The access size for both Pcache reads and writes 
is one quadword. Byte parity is maintained for each byte of data (32 bits per block). One bit of 
parity is maintained for every tag. The Pcache has a one cycle access and a one cycle repetition 
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rate for both reads and writes (note however, that the entire Mbox latency is two cycles due to 
the two stage Mbox pipeline). 

The Pcache represents the first level of D-stream memory hierarchy and the second level of I- 
stream memory hierarchy in all NVAX computer systems. Pcache entries must be invalidated in 
order to maintain cache coherency with higher levels of the memory hierarchy. See Section 12.4 
for more information on the Pcache. 



12-22 The Mbox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



12.3 REFERENCE PROCESSING 

This section discusses how references are processed by the Mbox, and how the Mbox functional 
components interact to carry out reference processing. 

12.3.1 REFERENCE DEFINITIONS 

The following table describes all types of references processed by the Mbox: 



Table 12-1: Reference Definitions 



Name 



Value (hex) 



Reference Source 



Description 



IREAD OE 

DREAD 1C 

DREAD.MODIFY ID 

DREAD LOCK IF 



Ibox 

Ibox, Ebox, Mbox 
Ibox 

Ebox 



Aligned quadword I -stream read 

Variable length D- stream read 

Variable length D- stream read with 
modify intent as a result of Ibox- 
decoded modify specifiers 

Variable length D- stream read with 
atomic memory lock 



WRITE.UNLOCK 1A 



WRITE 
DEST.ADDR 

STORE 



IB 
OD 

19 



Ebox 

Ebox 
Ibox 

Ebox 



Variable length write with atomic 
memory unlock 

Variable length write 

Supplies address of a write-only 
destination specifier 

Supplies write data corresponding 
to a previously translated destina- 
tion specifier address. 



IPR_WR 
IPR_RD 
EPR.DATA 
LOAD PC 



06 
07 
04 
05 



Ebox 
Ebox 
Mbox 
Ebox 



Internal Processor Register Write 
Internal Processor Register Read 
Transfers Mbox IPR data to Ebox 
Transfers a PC value to Ibox via 

M*MD_BTO_H<31:0> 



PROBE 
MME CHK 



TB_TAG_FILL 
TB_PTE FILL 



09 
08 



0C 
14 



Ebox 

Ebox, Mbox 



Ebox, Mbox 
Ebox, Mbox 



Mbox returns ACV/TNV/M=0 sta- 
tus of specified address to Ebox. 

Performs ACV/TNV/M=0 check on 
specified address and invokes the 
appropriate memory management 
exception 

Writes a TB tag into a TB entry. 
Writes PTE data into a TB entry. 
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Table 12-1 (Cont): 


Reference Definitions 




Name 


Value (hex) 


Reference Source 


Description 


TBIS 


10 


Ebox 


Invalidates a specific PTE entry in 
the TB. 


TTQT A 
LB LA 


xo 


ill DOX^IVLDOX 


Invalidates all entries in TB. 


mnrn 

L B LP 


11 


Ebox 


Invalidates all PTE entries in TB 
corresponding to process-space tran 
lations. 


D_CF 


03 


Cbox 


D-stream quadword Pcache fill 


i_cp 


02 


Cbox 


I- stream quadword Pcache fill 


INVAL 


01 


Cbox 


Hexaword invalidate of a Pcache 
entry 


STOP_SPEC_Q 


OF 


Ibox 


Stops processing of specifier refer- 
ences. 


NOP 


00 


Ibox, Ebox, Mbox 


No operation 



12.3.2 SIMPLE MBOX PIPELINE FLOW 

A major Mbox design consideration was to return requested read data to the Ibox and Ebox as 
quickly as possible in order to minimize macropipeline stalls. If the Ebox pipeline is stalled 
because it is waiting for a memory operand to be loaded into its register file (md_stall condition), 
then the amount of time the Ebox remains stalled is related to how quickly the Mbox can return 
the data. In order to minimize Mbox read latency, a two-cycle pipeline organization is used. This 
organization allows requested read data to be returned in a minimum of two cycles after the read 
reference is shipped to the Mbox. 

The timing diagram below illustrates the basic sequential processing within the two-cycle Mbox 
pipeline. 
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Basic Mbox Timing 



S5 PIPE S6 PIPE 

I I 
| | | | | | | 

I I 



TB LOOKUP > < ROTATE t RETURN DATA 

TO IBOX & EBOX 

< Pcache ACCESS > 

(read, write, fill, invalidate) 



At the start of the S5 cycle, the Mbox drives the highest priority reference into the S5 pipe. The 
Mbox arbitration logic determines which reference should be driven into S5 at the end of the 
previous cycle. The first half of the S5 cycle is used to translate the virtual address to a physical 
address via the TB. 

The Pcache access is started during phase two of S5 and continues into the first quarter of S6. 

If the reference should cause data to be returned to the Ibox or Ebox, the first three phases of 
the S6 cycle is used to rotate the read data (if the data is not right-justified) and to transfer the 
data back to the Ibox and/or Ebox. 

Thus, assuming an aligned read reference is issued in cycle x by the Ibox or Ebox, the Mbox can 
return the requested data in cycle x+2 provided that 1) the translated read address was cached 
in the TB, 2) no memory management exceptions occurred, 3) the read data was cached in the 
Pcache, and 4) no other higher priority or pending reference inhibited the immediate processing 
of this read. 

12.3.3 REFERENCE ORDER RESTRICTIONS 

Due to the macropipeline structure of NVAX, the Mbox can receive "out-of-order" references 
from the Ibox and Ebox. That is, the Ibox can send a reference corresponding to an opcode 
decode before the Ebox has sent all references corresponding to the previous opcode. Issuing 
references "out-of-order" in a macropipeline introduces complexities in the Mbox to guarantee 
that all references will be processed correctly within the context of the VAX architecture, the 
NVAX macropipeline, and the Mbox hardware. Many of these complexities take the form of 
restrictions on how and when references can be processed by the Mbox. 

The following synchronization example is useful to illustrate several of the reference order 
restrictions. 
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Figure 12-13: 2 Processor Synchronization Example 



PROCESSOR 1 



PROCESSOR 2 



MOVX #1,C 
MOVL" #1,T 



105 BLBC T,10$ 
MOV1 C,R0 



This example illustrates two processors operating in a multiprocessor environment. Initially, 
processor 1 owns the critical section corresponding to memory location T. Processor 1 will modify 
memory location C since it currently has ownership. Subsequently, processor 1 will release 
ownership by writing a 1 into T. Meanwhile, processor 2 is "spinning" on location T waiting 
for T to become non-zero. Once T is non-zero, processor 2 will read the value of C. 

Note that this example is not the preferred way to implement synchronization. A better way 
would be to use VAX interlocked instructions which guarantee atomicity. This is, however, a 
valid example under current SRM rules because it does not disallow an NVAX multiprocessor 
system from supporting this synchronization structure. 

The following discussion explains the Mbox reference order restrictions. 

12.3.3.1 No D-stream hits under D-stream misses 

"No D-stream hits under D-stream misses" refers to the fact that the Mbox will not allow a 
D-stream read reference, which hits in the Pcache, to execute as long as requested data for a 
previous D-stream read has not yet been supplied. 

Consider the code that processor 2 executes in the example above. If the Mbox allowed D-stream 
hits under D-stream misses, then it is possible for the Ibox read of C to hit in the Pcache during a 
pending read miss sequence to T. In doing so, the Mbox could supply the value of C before processor 
1 modified C. Thus, processor 2 would get the old C with the new T causing the synchronization 
code to operate improperly. 

Note that, while D-stream hits under D-stream misses is prohibited, the Mbox will execute a 
D-stream hit under a D-stream nil operation. In other words, the Mbox will supply data for a 
read which hit in the Pcache while a Pcache fill operation to a previous missed read is in progress, 
provided that the missed read data has already been supplied. 

I-streaxn and D-stream references are handled independently of each other. That is, I-stream 
processing can proceed regardless of whether a D-stream miss sequence is currently executing, 
assuming there is not Pcache index conflict. 

12.3.3.2 No i-stream hits under l-stream misses 

This is the analogous case for I-stream read references. This restriction is necessary to guarantee 
that the Ibox will always receive its requested I-stream reference first, before any other I-stream 
data is received. 
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12.3.3.3 Maintain the order of writes 

Consider the example shown above. If the Mbox of processor 1 were to reorder the write to C 
with the write to T, then processor 2 could read the old value of C before processor 1 updated C. 
Thus, the Mbox must never re-order the sequence of writes generated by the Ebox microcode. 

12.3.3.4 Maintain the order of Cbox references 

Again consider the example above. Processor 2 will receive an invalidate for C as a result of 
the write done by processor 1 in the MOVL #1,C instruction. If this invalidate were not to be 
processed until after processor 2 did the read of C then, the wrong value of C has been placed in 
R0. 

Strictly speaking we must guarantee that the invalidate to C happens before the read of C. 
However, since C may be in the Pcache of processor 2, there is nothing to stop the read of C from 
occurring before the invalidate is received. Thus from the point of view of processor 2, the real 
restriction here is that the invalidate to C must happen before the invalidate to T which must 
happen before the READ of T which causes processor 2 to fall throught the loop. As long as the 
Mbox does not re-order Cbox references, the invalidate to C will occur before a non-zero value of 
T is read. 

12.3.3.5 Preserve the order of Ibox reads relative to any pending Ebox writes to the same 
quadword address 

Consider the following example: 

Figure 12-14: Memory Scoreboard Example 



MOVL #1,C 
MOVL C,R0 



In the NVAX macropipeline, the Ibox prefetches specifier operands. Thus, the Mbox receives a 
read of C corresponding to the "MOVL C,R0" instruction. This read, however, cannot be done 
until the write to C from the previous instruction completes. Otherwise, the wrong value of C 
will be read. 

In general, the Mbox must ensure that Ibox reads will only be executed once all previous writes 
to the same location have completed. 

12.3.3.6 I/O Space Reads from the Ibox must only be executed when the Ebox is executing the 
corresponding Instruction 

Unlike memory reads, reads to certain I/O space addresses can cause state to be modified. As a 
result, these I/O space reads must only be done in the context of the instruction execution to which 
the read corresponds. Due to the macropipeline structure of NVAX, the Ibox can issue an I/O 
space read to prefetch an operand of an instruction which the Ebox is not currently executing. 
Due to branches in instruction execution, the Ebox may in fact never execute the instruction 
corresponding to the I/O space read. Therefore, in order to prevent improper state modification, 
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the Mbox must inhibit the processing of I/O space reads issued by the Ibox until the Ebox is 
actually executing the instruction corresponding to the I/O space read. 

12.3.3.7 Reads to the same Pcache block as a pending read/fill operation must be inhibited 

The organization of the Pcache is such that one address tag corresponds to four subblock valid 
bits. Therefore, the validated contents of all four subblocks must always correspond to the tag 
address. If two distinct Pcache fill operations are simultaneously filling the same Pcache block, 
it is possible for the fill data to be intermixed between the two fill operations. As a result, an 
IREAD to the same Pcache block as a pending D-stream read/fill is inhibited until the pending 
read/fill operation completes. Similarly, a D-stream read to the same Pcache block as a pending 
I-stream read/fill is also inhibited until the fill completes. 

12.3.3.8 Writes to the same Pcache block as a pending read/fill operation must be inhibited until 
the read/fill operation completes 

As in the above, this restriction is necessary in order to guarantee that all valid subblocks contain 
valid up-to-date data. Consider the following situation. The Mbox executes a write to an invalid 
subblock of a Pcache block which is currently being filled. One cycle later, the cache fill to that 
same subblock arrives at the Pcache. Thus, the latest subblock data, which came from the write, 
is overwritten by older cache fill data. This subblock is now marked valid with "old" data. Tb 
avoid this situation, writes to the same Pcache block as a pending read/fill operation are inhibited 
until the cache fill sequence completes. 

12.3.4 REFERENCE ARBITRATION 

The Mbox maintains seven different reference storage devices in S5. The purpose of these devices 
is to buffer pending references, which originate from different sections of the chip, until they can 
be processed by the Mbox. In order to optimize performance of the NVAX pipeline, and to maintain 
functional correctness of reference processing in light of the Mbox hardware configuration and 
reference order restrictions, the Mbox services references from these queues in a prioritized 
fashion. 

12.3.4.1 Arbitration Priority 

During every Mbox cycle, the reference arbitration logic is responsible for determining which 
unserviced references should be processed next cycle. The reference sources are listed below from 
highest to lowest priority: 

1. CBOXJLATCH 

2. RTY_DMISS_LATCH 

3. MME.LATCH 

4. VAPJLATCH 

5. EM_LATCH 

6. SPEC.QUEUE 

7. IREF.LATCH 

8. nothing can be driven ==> Mbox drives a NOP command into S5 
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This prioritized scheme does not directly indicate which pending reference will be driven next, 
but instead indicates in what order the pending references should be tested to determine which 
one will be processed. Conceptually, the highest pending reference which satisfies all conditions 
for driving the reference is the one which is allowed to execute during the subsequent cycle. 

The rationale behind this priority scheme can be explained as follows. All references coming from 
the Cbox are always serviced as soon as they are available. Since Cbox references are guaranteed 
to complete in S5 in one cycle, we eliminate the need to queue up Cbox references and to provide 
a back-pressure mechanism to notify the Cbox to stop sending references. 

A D-stream read reference in the RTY_DMISS_LATCH is guaranteed to have cleared all potential 
memory management problems. Therefore, any reference stored in this latch is the second 
consideration for processing. 

If a reference related to memory management processing is pending in the MMEJLATCH, it 
is given priority over the remaining four sources because the Mbox is designed to clear all 
memory management exceptions through the use of the MME_LATCH before normal processing 
can resume. 

The VAPJLATCH stores the second reference of an unaligned reference pair. Since we desire 
to complete the entire unaligned reference before starting another reference, the VAP_LATCH 
has next highest priority in order to complete the unaligned sequence that was initiated from a 
reference of lesser priority. 

The EM_LATCH stores references from the Ebox. It is given priority over the SPEC_QUEUE 
and IREF_LATCH sources because Ebox references are physically further along in the pipe than 
Ibox references. The presumed implication of this fact is that the Ebox has a more immediate 
need to satisfy its reference requests than the Ibox, since the Ebox is always performing real 
work and the Ibox is prefetching operands that may, in fact, never be used. 

The SPEC_QUEUE stores Ibox operand references. It is next in line for consideration. The 
SPEC_QUEUE has priority over the IREFJLATCH because specifier references are again 
considered further along in the pipeline than I-stream prefetching. 

If no other reference can currently be driven, the IREF_LATCH can drive an I-stream read 
reference in order to supply data to the Ibox. 

If no reference can currently be driven into S5, the Mbox automatically drives a NOP command. 

1 2.3.4.2 Arbitration Algorithm 

Based on the priority scheme discussed above, the arbitration logic tests each reference to see 
whether it can be processed next cycle by evaluating the current state of the Mbox. The test 
associated with each latch is described below: 

• CB OX_LATCH: Since Cbox references always want to be processed immediately, a validated 
CBOX_LATCH always causes the Cbox reference to be driven before all other pending 
references. 

• RTY_DMISS_LATCH: A pending D-stream read reference will be driven from this latch once 
the final D_CF command has been retired from the S5 pipe. 

• MMEJLATCH: A pending MME reference will be driven when the contents of the 
MME_LATCH is validated. 

• VAP.LATCH: A reference from the VAP_LATCH will be driven provided that the VAP.LATCH 
is validated. 
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• EM_LATCH: A reference from the EM_LATCH will be driven provided that the EMJLATCH 
is validated. 

• SPEC_QUEUE: A vaHdated reference in the SPEC_QUEUE will be driven provided that 
the SPEC_QUEUE has not been stopped due to explicit Ebox writes in progress (see 
Section 12.3.20). 

• IREF_LATCH: A reference from the IREF.LATCH will be driven provided that the 
IREF_LATCH has not been stopped due to a pending READ_LOCK/WRITE_UNLOCK 
sequence (See Section 12.3.19.2). 

If none of the conditions above are satisfied, the Mbox will drive a NOP command onto 
M_QUE9bS5_CMD_H<4:0> causing the S5 pipe to become idle. 

12.3.5 READS 

12.3.5.1 Generic Read-hit and Read-mlss/Cache_fIII Sequences 

In order to orient the reader as to how memory reads are processed by the Mbox, this section will 
describe the "vanilla" read sequence. It does not discuss reads which TB_MISS, or otherwise are 
stalled for a variety of different reasons. 

The byte mask generator generates the corresponding 

byte mask by looking at M_QUE^S5_VA_H <2 :0 > and M_QUE%S5_DL_H < 1 : 0 > and then drives the 
byte mask data onto M%S€_B YTE_MASK_H < 7 : 0 > during the subsequent cycle. Byte mask data is 
generated on a read operation in order to supply the byte alignment information to the Cbox on 
an I/O space read. 

When a read reference is initiated in the S5 pipe, the address is translated by the TB (assuming 
the address was virtual) to a physical address during the first half of the S5 cycle. The Pcache 
initiates a cache lookup sequence using this physical address during the second half of the S5 
cycle. This cache access sequence overlaps into the following S6 cycle. During phase four of the 
S6 cycle, the Pcache determines whether the read reference is present in its array. 

If the Pcache determined that the requested data is present, a "cache hit" or "read hit" condition 
occurs. In this event, the Pcache drives the requested data onto B%S6_DAIA W H<63:0>. The signal, 
M9oCBOX_EEF_ENABLE_L, is de-asserted to inform the Cbox that it should not process the S6 read 
since the Mbox will supply the data from the Pcache. 

If the Pcache determined that the requested data is not present, a "cache miss" or "read 
miss" condition occurs. In this event, the read reference is loaded into the IMISS_LATCH or 
DMISSJLATCH (depending on whether the read was I-stream or D- stream) and the Cbox is 
instructed to continue processing the read by the Mbox assertion of M9K^OXJREFJENABLE_L. At 
some point later, the Cbox obtains the requested data. The Cbox will then send four quadwords 
of data using the I_CF (I-stream cache fill) or D_CF (D-stream cache fill) commands. The four 
cache fill commands together are used to fill the entire Pcache block corresponding to the hexaword 
read address. In the case of D-stream fills, one of the four cache fill command will be qualified 
with C%EEQ_DQW_H indicating that this quadword fill contains the requested D-stream data 
corresponding to the quadword address of the read. When this fill is encountered, it will be 
used to supply the requested read data to the Mbox, Ibox and/or Ebox. 

If, however, the physical address corresponding to the I_CF or D_CF command falls into I/O 
space, only one quadword fill is returned and the data is not cached in the Pcache. Only memory 
data is cached in the Pcache. 
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Each cache fill command sent to the Mbox is latched in the CBOX_LATCH. Note that neither 
the entire cache fill address nor the fill data are loaded into the CBOXJLATCH. The address in 
the IMISS.LATCH or DMISS.LATCH, together with two quadword alignment bits latched in the 
CB OX_LATCH are used to create the quadword cache fill address when the cache fill command 
is executed in S5. When the fill operation propagates into S6, the Cbox drives the corresponding 
cache fill data onto B%S6_DATA_H<63:0> in order for the Pcache to perform the fill. 

12.3.5.1.1 Returning Read Data 

Data resulting from a read operation is driven on B%S6_DATA_H by the Pcache (in the cache hit 
case) or by the Cbox (in the cache miss case). This data is then driven on M<&MD_BUS_H<63:0> 
by the MD_BUS_ROTATOR in right-justified form. The signals M%VIC_DATA_L, M%IBOX_DATA_L, 
M%EBOXJPR_WR_H, M%EBOX_DATA_H, M9cMBOX_DATA, are conditionally asserted with the data to 
indicate the destination(s) of the data. 

12.3.5.1.1.1 Pcache Data Bypass 

In order to return the requested read data to the Ibox and/or Ebox as soon as possible, the Cbox 
implements a Pcache Data Bypass mechanism. When this mechanism is invoked, the requested 
read data can be returned one cycle earlier than when the data is driven for the S6 cache fill 
operation. The bypass mechanism works by having the Mbox inform the Cbox that the next S6 
cycle will be idle, and thus the BTcS6_D.AH4._H bus will be available to the Cbox. When the Cbox is 
informed of the S6 idle cycle, it drives the B%S6_DATA_H bus with the requested read data if read 
data is currently available (if no read data is available during a bypass cycle, the Cbox drives 
some indeterminent data and no valid data is bypassed). The read data is then formatted by the 
MD_B US_R OTATOR and transferred onto the M%MD_BUS_H to be returned to the Ibox and/or 
Ebox, qualified by M%VIC_DATA_L, M%EBOX_DATA_L, and/or M%EBOX_DATA_H. 

12.3.5.2 l-stream Read Processing 

Memory access to all I-stream code is implemented by the Mbox on behalf of the Ibox. The Ibox 
uses the I-stream data to load its prefetch queue and to fill the VIC (Virtual Instruction Cache). 

When the Ibox requires I-stream data which is not stored in the prefetch queue or the VIC, the 
Ibox issues an I-stream read request which is latched by the IREF_LATCH. The Ibox address is 
always interpreted by the Mbox as being an aligned quadword address. Depending on whether 
the read hits or misses in the Pcache, the amount of data returned varies. The Ibox continually 
accepts I-stream data from the Mbox until the Mbox qualifies I-stream MDJBUS data with the 
M%LAST_ETLL_H signal. M%LAST_FILL_H informs the Ibox that the current fill terminates the 
initial IREAD transaction. 

12.3.5.2.1 I-stream Read Hits 

When the requested data hits in the Pcache, the Mbox turns the IREFJLATCH reference into a 
series of I-stream reads to implement a VIC "fill forward" algorithm. The fill forward algorithm 
generates increasing quadword read addresses from the original address to the highest quadword 
address of the original hexaword address. In other words, the Mbox generates read references so 
that the hexaword VIC block corresponding to the original address is filled from the point of the 
request to the end of the block. The theory behind this fill forward scheme is that it only makes 
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sense to supply I-stream data following the requested reference since I-stream execution causes 
monotonically increasing I-stream addresses (neglecting branches). 

The fill forward scheme is implemented by the IREF.LATCH. Once the IREFJLATCH read 
completes in S5, the IREF_LATCH quadword address incrementor modifies the stored address 
of the IREF_LATCH so that its contents becomes the next quadword IREAD. Once this "new" 
reference completes in S5, the next IREAD reference is generated. When the IREF_LATCH finally 
issues the IREAD corresponding to the highest quadword address of the hexaword address, the 
forward fill process is terminated by invalidating the IREFJLATCH. 

12.3.5.2.2 l-stream Read Misses 

The fill forward algorithm described above is always invoked upon receipt of an IREAD. However, 
when one of the IREADs is found to have missed in the Pcache, the subsequent IREAD 
references are flushed out of the So pipe and the IREF_LATCH. The missed IREAD causes 
the IMISS_LATCH to be loaded and the Cbox to continue processing the read. When the Cbox 
returns the resulting four quadwords of Pcache data, all four quadwords are transferred back 
to the Ibox qualified by MTAIC.DAT^L. This in effect, results in a VIC "fill full" algorithm since 
the entire VIC block will be filled. Fill full is done instead of fill forward because it costs little 
to implement. The Mbox must allocate a block of cycles to process the four cache fills; therefore, 
all the Pcache fill data can be shipped to the VIC with no extra cost in Mbox cycles since the 
MTcMD_BUS_H would otherwise be idle during these fill cycles. 

Note that the Ibox is unaware of what fill mode the Mbox is currently operating in. The 
VIC continues to fill I-stream data from the M&MD_BUS_H whenever M9cVICJDAIA w L is asserted 
regardless of the Mbox fill mode. The Mbox asserts the M%LAST_FILL_H signal to the Ibox during 
the cycle which the Mbox is driving the last I-stream fill to the Ibox. M%LAST_FILL_H informs 
the Ibox that is is receiving the final VIC fill this cycle and that it should not expect any more. 
In fill forward mode, the Mbox asserts M%LAST_FILL_H when the quadword alignment equals 11 
(i.e. the upper-most quadword of the hexaword). In fill full mode, the Mbox receives the last fill 
information from the Cbox and transfers it to the Ibox through the M%LAST_FELL_H signal. 

It is possible to start processing I-stream reads in fill forward mode, but then switch to fill full. 
This could occur because one of the references in the chain of fill forward IREADs misses due to 
a recent invalidate or due to displacement of Pcache I-stream data by a D-stream cache fill. In 
this case, the Ibox will receive more than four fills but will remain in synchronization with the 
Mbox because it continually expects to see fills until M%LAST_FILL_H is asserted. 

12.3.5.2.3 I/O Space I-stream Reads 

See Section 12.3.5.4. 

12.3.5.3 D-stream Read Processing 

Memory access to all D-stream references is implemented by the Mbox on behalf of the Ibox 
(for specifier processing), the Mbox (for PTE references), and the Ebox (for all other D-stream 
references). 

In general D-stream read processing behaves the same way as I-stream read processing except 
that there is no fill forward or fill full scheme. In other words, only the requested data is shipped 
to the initiator of the read. From the Pcache point of view, however, a D-stream fill full scheme 
is implemented since four D_CF commands are still issued to the Pcache. 
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D-stream reads can have a data length of byte, word, longword or quadword. With the exception 
of the cross-page check function, a quadword read is treated as if its data length were a longword. 
Thus a D-stream quadword read returns the lower half of the referenced quadword- The source 
of most D-stream quadword reads is the Ibox. The Ibox will issue a D-stream longword read to 
the upper half of the referenced quadword immediately after issuing the quadword read. Thus, 
the entire quadword of data is accessed by two back-to-back D-stream read operations. 

A DREAD_LOCK command always forces a Pcache read miss sequence regardless of whether 
the referenced data was actually stored in the Pcache. This is necessary in order that the read 
propagate out to the Cbox so that the memory lock/unlock protocols can be properly processed. 

12.3.5.3.1 Reads under Fills 

The Mbox will attempt to process a DREAD after the requested fill of a previous D-stream fill 
sequence has completed. This mechanism, called "reads under fills" is done to try to return read 
data to the Ibox and/or Ebox as quickly as possible, without having to wait for the previous fill 
sequence to complete. 

If the attempted read hits in the Pcache, the data is returned and the read completes. If the 
read misses in the S6 pipe, the corresponding fill sequence is not immediately initiated for two 
reasons: 

• A D-stream cache fill sequence for this read cannot be started because the DMISS_LATCH 
is full corresponding to the currently outstanding cache fill sequence. 

• The D-stream read may hit in the Pcache once the current fill sequence completes because 
the current fill sequence may supply the data necessary to satisfy the new D-stream read. 

Because this DREAD has already propagated through the S5 pipe, the read must be stored 
somewhere in order that it can be restarted in S5. The RTY_DMISS_LATCH is the mechanism 
by which the S6 read is saved and restarted in the S5 pipe. 

Once the read is stored in the RTYJDMISSJLATCH, it will be retried in S5 after the final D_CF 
reference is retired from S5 (the final D_CF completes the previous D-stream fill sequence). The 
RTY_DMISS_IATCH is invalidated when the retried reference is retired from S5. 

12.3.5.4 I/O Space Reads 

I/O space reads are defined as reads which address I/O space. Therefore, a read is an I/O read 
when the physical address bits, addr<31:29>, are set. I/O space reads are treated by the Mbox 
in exactly the same way as any other read, except for the following differences: 

• I/O space data is never cached in the Pcache. Therefore, an I/O space read always generates 
a read-miss sequence and causes the Cbox to process the reference. 

• Unlike, a memory space miss sequence, which returns a hexaword of data via four I_CF or 
D_CF commands, an I/O space read returns only one piece of data via one I_CF or D_CF 
command. Thus the Cbox always asserts C%LAST_FILL_H on the first and only I_CF or D_CF 
I/O space operation. If the I/O space read is D-stream, the returned D_CF data is always less 
than or equal to a longword in length 
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• I/O space D-stream reads are never prefetched ahead of Ebox execution. An I/O space 
D-stream read issued from the Ibox is only processed when the Ebox is known to be stalling 
on that particular I/O space read (see Section 12.3.18.1.1). 

NVAX RESTRICTION 

I-stream I/O space reads must return a quadword of data. Execution of an I-stream 
I/O space read which does not return a quadword of data is unpredicatable. 

12.3.6 WRITES 

All writes are initiated by the Mbox on behalf of the Ebox. The Ebox microcode is capable of 
generating write references with data lengths of byte, word, longword, or quadword. With the 
exception of cross-page checks (see Section 12.5.1.5.4), the Mbox treats quadword write references 
as longword write references because the Ebox datapath only supplies a longword of data per 
cycle. Ebox writes can be unaligned. 

The Mbox performs the following functions during a write reference: 

• Memory Management checks: The Mbox checks to be sure the page or pages referenced have 
the appropriate write access and that the valid virtual address translations are available. 
(See Section 12.5 ) 

• The supplied data is properly rotated to the memory aligned longword boundary. 

• Byte Mask Generation: The Mbox generates the byte mask of the write reference by 
examining the write address and the data length of the reference. 

• Pcache writes: The Pcache is a write-through cache. Therefore, writes are only written into 
the Pcache if the write address matches a validated Pcache tag entry. 

The one exception to this rule is when the Pcache is configured in force D-stream hit mode. 
In this mode, the data is always written to the Pcache regardless of whether the tag matches 
or mismatches. 

• All write references which pass memory management checks are transferred to the Cbox via 
B%S6JDAIA_H<63:0>. The Cbox is responsible for processing writes in the Bcache and for 
controlling the protocols related to the write-back memory subsystem. 

"When write data is latched in the EMJLATCH, the 4-way byte barrel shifter associated with the 
EMJLATCH rotates the EMJLATCH data into proper alignment based on the lower two bits of 
the corresponding address. The diagram below illustrates the barrel shifter function: 
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quadword wide. B9"cS6_DAIA_H is a quadword wi 
The quadword access size facilitates Pcache and 

half of B%S6_DAIAJH<63:0> is ever used to write the Pcache since ajj. wine wmiiiin...— ' 

a longword or less of data. When a write reference propagates from S5 to S6, the longword 
aHgned data on M_QUE%S5_DAXAJH<31:0> is transferred onto both the upper and lower halves of 
B%S6JDATA_H<63:0> to guarantee that the data is also quadword aHgned to the Pcache and Cbox. 
The byte mask corresponding to the reference will control which bytes of B%S6_DAIAJi<63:0> 
actually get written into the Pcache or Bcache. 

Write references are formed through two distinct mechanisms described below. 



12.3.6.1 Destination Specifier Writes 

Destination specifier writes are those writes which are initiated by the Ibox upon decoding a 
destination specifier of an instruction. When a destination specifier to memory is decoded, the 
Ibox issues a reference packet corresponding to the destination address. Note that no data is 
present in this packet because the data is generated when the Ebox subsequently executes the 
instruction. The command field of this packet is either a DEST_ADDR command (when the 
specifier had access type of write) or a DREAD_MODIFY command (when the specifier had access 
type of modify). 

The address of this command packet is translated by the TB, memory management access checks 
are performed, and the corresponding byte mask is generated. The physical address, DL and 
other qualifer bits are loaded into the PA_QUEUE. When the DEST_ADDR command completes 
in S5, it is turned into a NOP command in S6 because no further processing can take place 
without the actual write data. 
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When the Ebox executes the opcode corresponding to the Ibox destination specifier, the 
corresponding memory data to be written is generated. This data is sent to the Mbox by a 
STORE command. The STORE packet contains only data. When the Mbox executes the STORE 
command in S5, the corresponding PA_QUEUE packet is driven into the S5 pipe. The data in 
the EM_1ATCH is rotated into proper longword alignment using the byte rotator and the lower 
two bits of the corresponding PA_QUEUE address and are then driven into S5. In effect, the 
DEST_ADDR and STORE commands are merged together to form a complete physical address 
WRITE operation. This WRITE operation propagates through the S5/S6 pipeline to perform the 
write in the Pcache (if the address hits in the Pcache) and in the memory subsystem. 

12.3.6.2 Explicit Writes 

The term explicit writes defines writes generated solely by the Ebox. That is, writes which do 
not result from the Ibox decoding a destination specifier but rather writes which are explicitly 
initiated and fully generated by the Ebox. An example of an explicit write is a write performed 
during a MOVC instruction. In this example, the Ebox generates the virtual write address of 
every write as well as supplying the corresponding data. The PA_QUEUE is never involved in 
processing an explicit write. 

Explicit writes are transferred to the Mbox in the form of a WRITE command issued by the Ebox. 
These writes directly execute in S5 and S6 in the same manner as when a write packet is formed 
from the PA_ QUEUE contents and the STORE data. 

12.3.6.3 Writes to I/O Space 

I/O space writes are denned as a write command which addresses I/O space. Therefore, a write 
is an I/O space write when the physical address bits, addr<31:29>, are set. I/O space writes 
are treated by the Mbox in exactly the same way as any other write, except for the following 
differences: 

• I/O space data is never cached in the Pcache; therefore, an I/O space write always misses in 
the Pcache. 

12.3.6.4 Byte Mask Generation 

Since memory is byte-addressable, all memory storage devices must be able to selectively write 
specified bytes of data without writing the entire set of bytes made available to the storage device. 

The byte mask field of a write reference packet specifies which bytes within the quadword Pcache 
access size get written. The byte mask is generated in the Mbox by the byte mask generation 
logic based on M_QUE%S5_VA W H<2:0> and the data length of the reference. 

Byte mask data is generated on a read as well as a wriate in order to supply the byte alignment 
information to the Cbox on an I/O space read. The following table illustrates the behavior of the 
byte mask generator for all aligned reads and writes: 
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Table 12-2: 


Byte Mask Logic for Aligned References 






BM 


BM 


BM 




addr<2:0> 


(DL-byte) 


(DL=word) 


(DLadong) 


BM (DL-quad) 


000 


00000001 


00000011 


00001111 


00001111 


001 


00000010 


00000110 


00011110 


00011110 


010 


00000100 


00001100 


00111100 


00111100 


on 


00001000 


00011000 


01111000 


01111000 


100 


00010000 


00110000 


11110000 


11110000 


101 


00100000 


01100000 


unaligned 


unaligned 


110 


01000000 


11000000 


unaligned 


unaligned 


111 


10000000 


unaligned 


unaligned 


unaligned 



See Section 12.3.17.3 for a description of byte mask generator for unaligned references. 



12.3.7 IPR PROCESSING 

12.3.7.1 MBOX IPRs 

The Mbox maintains the following internal processor registers: 



Table 12-3: Mbox IPRs 

IPR Address 

Register Name (in hex) 

MP0BR (Mbox P0 Base Register) 1 £0 

MP0LR (Mbox P0 Length Register) 1 El 

MP1BR (Mbox PI Base Register) 1 E2 

MP1LR (Mbox PI Length Register) 1 E3 

MSSR (Mbox Systoin Base Register) E4 

MSLR (Mbox System Length Register) 1 E5 

MMAPEN (Map Enable Bit) 1 E6 

PAMODE (Address Mode) E7 

MMEADR (MME Faulting Address Register) 1 E8 

MMEPTE (PTE Address Register) 1 E9 

MMESTS (status of memory management exception) 1 EA 

TBADR (address of reference causing TB parity error) EC 

TBSTS (status of TB parity error) ED 

PC ADR (address of reference causing Pcache parity error) F2 

PCSTS (status of Pcache parity error and PTE hard errors) F4 

PCCTL (control state of Pcache operation) F8 



1 Testability and diagnostic use only, not for software use in normal operation. 
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Table 1 2-3 (Cont.): Mbox IPRs 




DPR Address 


Register Name 


(in hex) 


PCTAG 


01800000..01801FE( 


PCDAP 


01C00000..01C01FF 



The first thirteen IPRs listed above (memory management IPRs) are stored in the S5 pipe in 
the register file of the MME.DATAPATH. All other IPRs are stored in the S6 pipe. Note that 
when an Mbox IPR, other than a Pcache tag, is addressed, the actual IPR address is received on 
M_QUE%S5_VA_H<9:2> (the table above is written such that all addresses start at bit<0>). 

The following is the format description of each Mbox IPR. Each format illustrates the format 
visible at the programmer level. The formats do not necessarily illustrate the internal hardware 
storage format. 

Figure 12-16: IPR EO (hex), MPOBR 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04 1 03 02 01 00 
+ — +--+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+ — + — + — + — + — + — +--+ — + — + — + — + — + 
| II 0| system virtual page address of P0 page table | 0| 0| 0| 0| 0| 0| 0| 0| 0 | :MPOBR 

+ — +— + — + — + — + — + — + — + — + — + — + — + — + — + — + — -t — + — + — +-- + — + — + — + — +— + — +--+ — + — + — + — + — + 



Figure 12-17: IPR E1 (hex), MP0LR 



31 30 29 28127 26 25 24|23 22 21 20 1 19 18 17 16|15 14 13 12|11 10 09 08107 06 05 04|03 02 01 00 
+ — +--+ — + — + — + — +--+ — +--+—+ — + — + — + — + — +--+ — + — +—+--+ — + — +— + — +— + — +--+ — + — + — + — + — + 
| 0| 0| 0| 01 0| 0| 0| 0| 0| 0| length of P0 page table in longwords | :MP0LR 
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NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Barrel Shifter Function 



original + 1 — - — h n + 

4 bytes of I 4 | 3 I 2 | 1 I 

Ebox data +——+-—. +-.—..+ 



barrel shifter 
output when 

K_QUE%S5_VA_H<1 : 0> - 01 

barrel shifter 
output when 

K_QtTE%S5_Vi._H<l:0> - 10 

barrel shifter 
output when 

K Q'JZhBs VS. K<1:0> - 11 



The result of this data rotation is that all bytes of data are now in the correct byte positions 
relative to memory longword boundaries. 

When write data is driven from the EM_LATCH, M_QUE%S5_DATA W H<31:0> is driven by the output 
of the barrel shifter so that data will always be properly aligned to memory longword addresses. 

Note that, while the M_QUEf&S5_DATAJI bus is a longword wide, the B%S6_DAEA_H bus is a 
quadword wide. B9iS6_DAIA_H is a quadword wide due to the quadword Pcache access size. 
The quadword access size facilitates Pcache and VIC fills. However for all writes, at most 
half of B%S6_DAEA W H<63:0> is ever used to write the Pcache since all write commands modify 
a longword or less of data. When a write reference propagates from S5 to S6, the longword 
aligned data on M„QU1^S5_DA!EA 1 _H<31:0> is transferred onto both the upper and lower halves of 
B%S6_DATA W H<63:0> to guarantee that the data is also quadword aligned to the Pcache and Cbox. 
The byte mask corresponding to the reference will control which bytes of B%S6_DATA_H<63:0> 
actually get written into the Pcache or Bcache. 

Write references are formed through two distinct mechanisms described below. 

12.3.6.1 Destination Specifier Writes 

Destination specifier writes are those writes which are initiated by the Ibox upon decoding a 
destination specifier of an instruction. When a destination specifier to memory is decoded, the 
Ibox issues a reference packet corresponding to the destination address. Note that no data is 
present in this packet because the data is generated when the Ebox subsequently executes the 
instruction. The command field of this packet is either a DEST.ADDR command (when the 
specifier had access type of write) or a DREAD_MODIFY command (when the specifier had access 
type of modify). 

The address of this command packet is translated by the TB, memory management access checks 
are performed, and the corresponding byte mask is generated. The physical address, DL and 
other qualifier bits are loaded into the PA_QUEUE. When the DEST_ADDR command completes 
in S5, it is turned into a NOP command in S6 because no further processing can take place 
without the actual write data. 
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When the Ebox executes the opcode corresponding to the Iboz destination specifier, the 
corresponding memory data to he written is generated. This data is sent to the Mbox by a 
STORE command. The STORE packet contains only data. When the Mbox executes the STORE 
command in S5, the corresponding PA_QUEUE packet is driven into the S5 pipe. The data in 
the EMJLATCH is rotated into proper longword alignment using the byte rotator and the lower 
two bits of the corresponding PA_QUEUE address and are then driven into S5. In effect, the 
DEST_ADDR and STORE commands are merged together to form a complete physical address 
WRITE operation. This WRITE operation propagates through the S5/S6 pipeline to perform the 
write in the Pcache (if the address hits in the Pcache) and in the memory subsystem. 

12.3.6.2 Explicit Writes 

The term explicit writes defines writes generated solely by the Ebox. That is, writes which do 
not result from the Ibox decoding a destination specifier but rather writes which are explicitly 
initiated and fully generated by the Ebox. An example of an explicit write is a write performed 
during a MOVC instruction. In this example, the Ebox generates the virtual write address of 
every write as well as supplying the corresponding data. The PA_.QUEUE is never involved in 
processing an explicit write. 

Explicit writes are transferred to the Mbox in the form of a WRITE command issued by the Ebox. 
These writes directly execute in S5 and S6 in the same manner as when a write packet is formed 
from the PA_QUEUE contents and the STORE data. 

12.3.6.3 Writes to I/O Space 

I/O space writes are defined as a write command which addresses I/O space. Therefore, a write 
is an I/O space write when the physical address bits, addr<31:29>, are set. I/O space writes 
are treated by the Mbox in exactly the same way as any other write, except for the following 
differences: 

• I/O space data is never cached in the Pcache; therefore, an I/O space write always misses in 
the Pcache. 

12.3.6.4 Byte Mask Generation 

Since memory is byte-addressable, all memory storage devices must be able to selectively write 
specified bytes of data without writing the entire set of bytes made available to the storage device. 

The byte mask field of a write reference packet specifies which bytes within the quadword Pcache 
access size get written. The byte mask is generated in the Mbox by the byte mask generation 
logic based on M_QUE%S5_VA^H<2 :0> and the data length of the reference. 

Byte mask data is generated on a read as well as a wriate in order to supply the byte alignment 
information to the Cbox on an I/O space read. The following table illustrates the behavior of the 
byte mask generator for all aligned reads and writes: 
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Figure 12-18: IPR E2 (hex), MP1BR 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 

+—+—+—+— +--H +— + + + +—+--+ + — H +«+ — +—+—+—+—+ — +--+ + — + — 4 I +—+—+__+__+ 

I 1| 0| system virtual page address of PI page table | 0| 0| 0| 01 0| 0| 0| 0| 0|:MP1BR 

H +--H ^ + — H +— 1 + 4 +~+ — + — + — H +--H +--+— +~H + — +--+ — + — + — +—+—+—+ +--+--+ 



Rgure 12-19: IPR E3 (hex), MP1 LR 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — h — + — + — + — + — + — + — j, — + — + — + — ^ — + — h — + — + — + — + — + — + — + — + — + — + — + — + 

I 0| 0| 0! 0| 0| 0| 0| 0| 0| 0| length of (2**21) - PI page table in longwords | :MP1LR 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — h — + — h — + — + — + — ^ — + — + — + — + — + — + — + — + — + — + — + — + 



Rgure 12-20: IPR E4 (hex), MSBR 



31 30 29 28127 26 25 24|23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 

+ — + — + — + — + — + — + — + — + — + — + — ^ 1- — + — h — + — + — + — + — + — + — + — + — + — + — + — + — + — + — h — + — + — + 

I physical page address of system page table |0|0|0|0|0|0|0|0| 0 | :MSBR 

+ + H + + + + — + + + + + + + H + +■ + — H + H + + + + + + + + — -i + + + 



Rgure 12-21 : IPR E5 (hex), MSLR 



31 30 29 28127 26 25 24|23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — h + — + — + — + — + h + — 4 + + — H + — H + — + — + — H + — + — + + — + — + — H + — ■(- — + — + — + 

I 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| length of system page table in longwords I :MSLR 

+ + + + + h + + + + + + + + H + + + + + H + + + + + H + + + + + + 
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Figure 12-22: IPR E6 (hex), MMAPEN 



31 30 29 28|27 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04|03 02 01 00 



+—+—+--+ — +--+--+ — + — + — + — +— + — + — + — + — + — + — +—+—+—+ — + — + — +—+__+ — + __ +„+__+__+ — + — + 

| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| M|:NMAPEK 

+--+--+ — +--+—+--+ — + — + — + — + — + — + — +— + — +--+—+--+ — +— + — +— + — + — +--+ — +_-+ — +__+__+ — + — + 



Table 12-4: MMAPEN Field Descriptions 

Name Extent Type Description 

M 0 RW When 0, disables Mbox memory management. When 1, enables 

Mbox memory management. 



Figure 12-23: IPR E7 (hex), PAMODE 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16115 14 13 12111 10 09 08|07 06 05 04103 02 01 00 



+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I 0| 0| 0| 01 01 0| 0| 0| 0| 0| 0| 0| 01 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| |:PAMODE 
+ — + — + — + — + — + — + — + — + — + — + — + — +— + — + — + — + — + — + h — + + — + y — + — + — + — H + — + h — + — + 

I 

MODE + 



Table 12-5: PAMODE Field Descriptions 

Name Extent Type Description 

MODE 0 RW When 0, maps addresses from a 30-bit physical address space. When 

1, maps addresses from a 32-bit physical address space. 
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IPR E8 (hex), MMEADR 



31 30 29 28|27 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12|11 10 09 08 | 07 06 05 04|03 02 01 00 

V — + + + + + + — + + + + + + + — + + + + + — + + + + — + +" + + + + — + + + + 

I address associated with recorded MME fault | : MMEADR 



Figure 12-25: IPR E9 (hex), MMEPTE 



31 30 29 28127 26 25 24|23 22 21 20 1 19 18 17 16|15 14 13 12|11 10 09 08 i 07 06 05 04|03 02 01 00 
h — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I PTE address associated with an address corresponding to a modify fault | : MMEPTE 



Figure 12-26: IPR EA (hex), MMESTS 



31 30 29 28127 26 25 24|23 22 21 20119 18 17 16 115 14 13 12111 10 09 08 | 07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I | SRC | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|FAULT| 0| 0| 0| 0| 0| 0| 0| 0| 01 0| 0| Ml |LV|:MMESTS 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +. — + — + — + — + — + — +. — + — + — + — + — + — + — + — + — + — + 

< + > I 

I I 
+ LOCK PTE REF — + 



Table 12-6 : MMESTS Field Descriptions 



Name 


Extent 


Type 


Description 


LV 


0 


RO 


Indicates ACV fault occurred due to length violation. 


PTEJREF 


1 


RO 


Indicates ACV/TNV fault occurred on PTE reference corresponding 
to MMEADR. 


M 


2 


RO 


Indicates corresponding reference had write or modify intent. 


FAULT 


15:14 


RO 


Indicates nature of memory management fault See Fault bit 
encodings below 


SRC 


28:26 


RO 


Complemented shadow copy of LOCK bits. However, the SRC bits 
do not get reset when the LOCK bits are cleared. 


LOCK 


31:29 


RO,0 


Indicates the lock status of MMESTS. See LOCK encodings below. 
This field is cleared on e^flush.mbox^h. 



See Section 12.5.1.5.3.5 for information on how these fields are encoded. 
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Figure 12-27: IPR EC (hex), TBADR 



31 30 29 28127 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+--+--+—+—+—+--+—+—+—+—+--+--+--+--+--+—+—+—+—+—+—+—+--+ — +— + — +--+—+—+—+—+—+ 
I virtual address associated with the recorded TB parity error | : TBADR 

+ — +— + — + — + — +--+—+ — + — + — + — + — + — +—+—+—+ — +— + — +— + — +__+__+ — +__+ — +__+__+__+ — + — + __ + 



Figure 12-28: IPR ED (hex), TBSTS 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| SRC | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| CMD I I I I I : TBSTS 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I I I I 

EM_VAL + | | | 

TP ERR + | | 

DP ERR + | 

LOCK + 



Table 12-7: TBSTS Field Descriptions 



Name 


Extent 


Type 


Description 


LOCK 


0 


wc 


Lock Bit. When set, validates TBSTS contents and prevents any 
other field from further modification. When clear, indicates that no 
TB parity error has been recorded and allows TBSTS and TBADR 
to be updated. 


DPERR 


1 


RO 


Data Error Bit. When set, indicates a TB data parity error. 


TPERR 


2 


RO 


Tag Error Bit. When set, indicates a TB tag parity error. 


EM.VAL 


3 


RO 


EM.LATCH valid bit. Indicates if EM_LATCH was valid at the time 
of the error TB parity error detection. This helps the software error 
handler determine if a write operation may have been lost due to 
the TB parity error. 


CMD 


8:4 


RO 


S5 command corresponding to TB parity error. 


SRC 


31:29 


RO 


Indicates the original source of the reference causing TB parity error. 



See Section 12.6.4.1 for information on how these fields are encoded. 
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IPR F2 (hex), PCADR 



31 30 29 28|27 26 25 24 | 23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
— + — + — + — +__+ — +__+ — +— + — +-_+__+ — +__+ — +__+ — +__+__+„+ — + — +__+__+ — + — + — + — + __ + — + — + __ + 

I quadword physical address associated with the recorded Pcache parity error | 0| 0| 0|:PCADR 



Figure 12-30: IPR F4 (hex), PCSTS 



31 30 29 28|27 26 25 24 1 23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 



II II II II II II II II II II II II II II II II II II II II II I 

I 



I I 



PTE_ER 

PTE_ER_WR— 
LEFT_BANK— 
RIGHT_BANK- 

DPERR 

LOCK 



I I 



Table 12-8: PCSTS Field Descriptions 



Name 


Extent 


Type 


Description 


LOCK 


0 


WC 


Lock Bit. When set, validates PCSTSk8:l> contents and prevents 
modification of these fields. When clear, invalidates PCSTS<8:1> 
and allows these fields and PCADR to be updated. 


DPERR 


1 


RO 


Data Error Bit. When set, indicates a Pcache data parity error. 


RIGHT.BANK 


2 


RO 


Right Bank Tag Error Bit. When set, indicates a Pcache tag parity 
error on the right bank. 


LEFT_BANK 


3 


RO 


Left Bank Tag Error Bit. When set, indicates a Pcache tag parity 
error on the left bank. 


CMD 


8:4 


RO 


S6 command corresponding to Pcache parity error. 


PTE_ER_WR 


9 


WC 


Indicates a hard error on a PTE DREAD which resulted from a TB 
miss on a WRITE or WRITE.UNLOCK 


PTE_ER 


10 


WC 


Indicates a hard error on a PTE DREAD. 



Note that the state of PCSTS<31:11> are "don't cares" during an IPR write operation. 
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Figure 1 2-31 : IPR F8 (hex), PCCTL 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04|03 02 01 00 
y — + — + — + — + — + — + — + — + — + — + — + — + — + — + + — + + — + — + — + — + — + — + — + — + — + — + — + — + + — + — + 

I II II II 1| II II II II II II II II II II 1| II II II II II II II I I PMM | | | | | | : PCCTL 



RED_ENABLE + | 

ELEC_D ISABLE + 

P_ENABLE 

BANK_SEI 

FORCE_HIT 

I_ENABLE 

D ENABLE 



— + | 



Table 12-9: PCCTL Field Descriptions 



Name 



Extent Type Description 



D_ENABLE 0 RW,0 When set, enables Pcache for all INVAL operations and for 

all D-stream read/write/fill operations, qualified by other control 
bits. When clear, forces a Pcache miss on all Pcache D-stream 
read/write/fill operations. Note, however, that an ACV/TNV/M=0 
condition overrides a desasserted DJENABLE in that it will force a 
Pcache hit condition with D_ENABLE=0. 

I_ENABLE 1 RW,0 When set, enables Pcache processing of INVAL, IREAD and I_CF 

commands. When clear, forces a Pcache miss on IREAD operations 
and prevents state modification due to an I_CF operation. Note, 
however, that an ACV/TNv7M=0 condition overrides a desasserted 
I_ENABLE in that it will force a Pcache hit condition with 
I_ENABLE=0. 

FORCEJHIT 2 RW,0 When set, forces a Pcache hit on all reads and writes when Pcache 

is enabled for I or D-stream operation. 

BANK.SEL 3 RW,0 When set with FORCE_HIT=l, selects the "right bank" of the 

addressed Pcache index. When clear with FORCE_HIT=l, selects 
the 'left bank" of the addressed Pcache index. BANK_SEL is a don't 
care when FORCE_HIT=0. NOTE: BANK_SEL never affects bank 
selection during IPR reads and IPR writes to the Pcache tags or 
Pcache data parity bits; bank selection for these commands is always 
determined by the specified IPR address. 

P_ENABLE 4 RW,0 When set, enables detection of Pcache tag and data parity errors. 

When deasserted, disables Pcache parity error detection. 

PMM 7:5 RW,0 Specifies Mbox performance monitor mode (see Section 12.10). Note 

that this field does not control or affect the operation of the Pcache 
in any way. PMM is placed in PCCTL for the convenience of the 
hardware implementation. 
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Table 12-9 (Cont.): PCCTL Field Descriptions 



Name Extent Type Description 



ELEC_DISABLE 8 RW,0 When set, the Pcache is disabled electrically to reduce power 

dissipation. NOTE: This bit should only be set when the Pcache 
is functionally turned off by the deassertion of both I_ENABLE and 
DJENABLE. UNPREDICTABLE operation will result when this bit 
is set when either I_ENABLE or D_ENABLE is also set. Also note 
that Pcache tag or parity IPRs will not function properly when this 
bit is unconditionally set. 

RED_ENABLE 9 RO When set, indicates that one or more Pcache redundancy elements 

are enabled (see Section 12.4.11 for more information). 



Note that the state of PCCTL<31:10> are "don't cares" during an IPR write operation. 
Figure 12-32: IPRs 01800000 thru 01801 FE0 (hex), PCTAG 

31 30 29 28|27 26 25 24|23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| tag I 1| II 1| II 11 1| PI valid bltsl A|:PCTAG 



Table 12-10: PCTAG Field Descriptions 



Name 


Extent 


Type 


Description 




A 


0 


RW 


Allocation Bit corresponding to index of this tag. 




valid bits 


4:1 


RW 


Valid Bits corresponding to the four data subblocks. 
corresponds to uppermost quadword in block, 
corresponds to lowermost quadword in block. 


PCTAGk4> 
PCTAGkl> 


P 


5 


RW 


Even Tag Parity 




tag 


31:12 


RW 


Tag Data 





Note that the state of PCTAG<11:6> are "don't cares" during an IPR write operation. 
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Figure 12-33: IPRs 01C0OOOO thru 01C01FF8 (hex), PCDAP 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16115 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 



+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I 1| 1| 1| 11 1| II 1| 1| 1| II 11 1| 1| II 1| 1| 1| 11 1| 1| 1| II 1| II DATA_PARITY | : PCDAP 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



Table 1 2-11 : PCDAP Held Descriptions 

Name Extent Type Description 

DATA_PARITY 7.0 RW Even byte parity corresponding to addressed quadword of data. Bit 

n represents parity for byte n of addressed quadword. 



Note that the state of PCDAP<31:8> are "don't cares" during an IPR write operation. 



12.3.7.2 Hardware MBOX IPR Format 

The IPR formats listed above reflect the formats used by the programmer to execute IPR read 
and write operations. However, due to the specific structure of the Mbox memory management 
datapath, four memory management registers are internally stored in a different format in order 
to facilitate all length violation checks and PI space PTE calculations. The following describes 
the hardware formats of these registers: 

Figure 12-34: MP0LR Register 



31 30 29 28127 26 25 24 [23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08 1 07 06 05 04 1 03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I 0| length of P0 page table in longwords | 0| 0| 0| 0| 0| 0| 0| 0| 0| :MP0LR 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +— + — + — + — + — + — + — + — + — + — + — + — + — + 
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MP1LR Register 



31 30 29 28127 26 25 24123 22 21 20119 18 17 16|15 14 13 12111 10 09 08 | 07 06 05 04103 02 01 00 

+ + + + + H + + + H + H + + •> + H + H + + + + -i + + + H + H + +--+ 

I (length of (2**21) - PI page table in longwords) + lr_bias I 0| 0| 0| 01 0| 0| 0! 0| 0|:MP1LR 

H -» + + + +~+ A + + + + + + H + + + + + + + + -I + + +~ + + + +— + + 



Figure 12-36: MSLR Register 



31 30 29 28127 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08 107 06 05 04103 02 01 00 
0i length of system page table in iongworas | 01 0! 0i 0! 0i 0! 0i 0j 0 j :M£L?> 



The re-formating operation necessary to convert the program-level format to the hardware-level 
format is handled by microcode. When IPR writes are done to these registers, the microcode 
shifts the length register data 9 bits to the left before delivering the IPR_WRITE reference to 
the Mbox. In the MP1LR case, the microcode adds a bias value to the data following the shift 
operation. This is done in order to compensate for the "1" which will occur in virtual_addr<30> 
position during length check subtraction operations for all PI space virtual references. 

The microcode reverses the format operation to convert the Mbox IPR data back into the 
program-level format during MxLR IPR_READ operations. 

The hardware format for MP1BR is shown below: 



Figure 12-37: MP1BR Register 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08 | 07 06 05 04|03 02 01 00 

+ + + + + + + — -I +— + + +~ + +—+—+—+ + H + + +--+--+--+ H + -! + -I + +— + 

! system virtual page address of PI page table - br_bias I 0| 01 01 0| 01 0| 0| 01 0|:MP1BR 



Before sending the IPR_WRITE data to the Mbox, the microcode substracts a different bias value 
from the PI space base register. This is done in order to compensate for the "1" which will occur 
in virtual_addr<30> position during PI space PTE address calculations. 

The microcode reverses this format operation to convert the Mbox IPR data back into the 
program-level format during MP1BR IPR_READ operations. 

12.3.7.3 IPR Reads 

IPR reads (internal processor register reads) are issued to the Mbox by the Ebox using the IPR_RD 
command. The Ebox issues an EPR_RD in order to obtain the contents of an NVAX internal 
processor register existing somewhere in the system other than the Ebox. 
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12.3.7.3.1 Mbox IPR Reads 

When the Ebox issues an IPR_RD to an Mbox S5 IPR, the MME_DATAPATH will respond by 
accessing the appropriate register and loading it into the data field of the MME_LATCH. The 
MME_LATCH is then validated with an IPR_DATA command. Subsequently, the IPR_DATA 
command will execute in the Mbox pipe by passing the requested IPR data back to the Ebox on 
M%MD_BUS_H<31:0>, qualified by M%EBOX^DAIA w H. 

All Mbox S6 IPRs return their data directly onM9SMD_BUS_H<31:0>, qualified by M%EBOX_DATA^H, 
during the S6 execution of the IPR_RD command. 

Any IPR address in the range EO-FF which is not specified above is called a reserved Mbox IPR 
(reserved for any future Mbox IPR functional requirements). An IPR_RD to a reserved Mbox IPR 
will cause the assertion of MTcEBOXJDAIA^H in order to unstall the Ebox which is waiting for IPR 
data to be returned. Note however, that the returned data is UNPREDICATABLE. 

12.3.7.3.2 Non-Mbox IPR Reads 

The Ebox will issue an IPR_RD command to the Mbox to access IPRs existing in other sections 
of the NVAX computer system. Specifically, IPR_RD commands are issued to address IPRs in the 
Ibox, Cbox, NDAL and memory subsystem. 

IPR_RDs to the Ibox (IPR addresses DO-DF) are treated as NOPs. That is, execution of an Ibox 
IPR.RD command performs no Mbox function and does not modify any Mbox state. This behavior 
facilitates the Ebox microcode decode of IPR commands by allowing Ibox IPR_RDs to be issued 
to the Mbox even though the Mbox does not play a role in returning Ibox IPR data. 

IPR_RDs which do not address the Ibox or the Mbox are transferred to the Cbox for further 
processing by asserting M%CBO^REF_ENABLE_L when the IPR_RD is in S6. These IPRJRDs 
are handled by the Mbox in a manner similar to a DREAD which misses in the Pcache. The 
IPR.RD command is loaded into the DMISS_LATCH as the command is transferred to the Cbox. 
DMISS_LATCH state is set to indicate that the reference is not cacheable. Subsequently, the 
Cbox responds to the IPR_RD by sending back the requested data via one D_CF command. The 
IPR_RD sequence is similar to an I/O space READ miss sequence in that only one D_CF command 
is sent rather than four, and the returned data is not loaded in the Pcache even though a D_CF 
command was used to return the data. 

12.3.7.4 IPR WRITES 

IPR writes (internal processor register writes) are issued to the Mbox by the Ebox using the 
IPR_WR command. The IPRJWR command modifies the contents of an internal processor register 
which is located in the Ibox, Mbox, Cbox, NDAL or memory subsystem. The addressed register 
is modified using the longword of data associated with the IPRJWR command. 

12.3.7.4.1 Mbox IPR Writes 

All Mbox IPRs located in S5 reside in the MME_DATAPATH. These IPRs are written by the 
IPRJWR command during the cycle after the IPRJWR executes in S5. All other Mbox IPRs 
reside in S6 and are written during the cycle when the IPRJWR executes in S6. See Table 12-3 
for a description of the Mbox IPR registers. 

An IPR_WR to an Mbox reserved IPR causes no action to be taken. 
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12.3.7.4.2 Non-Mbox IPR Writes 

Unlike Ibox IPR reads, the Mbox plays a role in processing Ibox IPR writes. The Mbox recognizes 
all Ibox IPR writes (addresses DO-DF) and passes the data through the Mbox pipeline onto 
M%MD_BUS_H<31:0> qualified by M%moxjPR_WR. The Ibox receives the IPR write data and stores 
it in the Ibox IPR specified by information received directly from the Ebox. Processing Ibox IPR 
writes via the Mbox allows the M%MD_BUS_H to be used to transfer Ibox IPR write data without 
the need for a special Ebox-Ibox data bus. 

The Mbox asserts M%CBOX^REF_ENABLE_L to the Cbox when the addressed IPR falls outside of 
the Ibox and Mbox IPR address space. This causes the Cbox to continue processing the IPR_WR. 

12.3.8 LOAD_PC 

The LOADJPC command is used to transfer a new Program Counter value from the Ebox to the 
Ibox via the Mbox. This PC value propagates through the Mbox in order to transfer the Ibox data 
across M%MD_BUSJH<31:0>. Using the M9£MDJBUS_H for this purpose eliminates the need for a 
special Ebox-Ibox data bus. 

The LOAD_PC command operates in a manner identical to an Ibox IPR_WR command. The only 
difference between a LOAD_PC and an Ibox IPR_WR command is that no IPR address need be 
decoded. The LOAD_PC command directlv specifies the destination of the data as being the Ibox 
PC. 

12.3.9 INVALIDATES 

The Pcache must always be a coherent cache with respect to the Bcache. In other words, the 
Pcache must always contain a strict subset of the data cached in the Bcache. If cache coherency 
were not maintained, incorrect computational sequences could result from reading "stale" data 
out of the Pcache in multi-processor system configurations. 

An invalidate is the mechanism by which the Pcache is kept coherent with the Bcache. A Pcache 
invalidate operation occurs when data is displaced from the Bcache or when Bcache data is 
invalidated. The Cbox initiates an invalidate by specifying a hexaword physical address qualified 
by the INVAL command. The INVAL command is latched by the Mbox in the CBOX.LATCH. 

Execution of an INVAL command guarantees that data corresponding to the specified hexaword 
address will not be valid in the Pcache. If the hexaword address of the INVAL command 
does not match to either Pcache tag in the addressed index, no operation takes place. If the 
hexaword address matches one of the tags, the four corresponding subblock valid bits are cleared 
to guarantee that any subsequent Pcache accesses of this hexaword will miss until this hexaword 
is re- validated by a subsequent Pcache fill sequence. If a cache fill sequence to the same hexaword 
address is in progress when the INVAL is executed, a bit in the corresponding MISS_LATCH is 
set to inhibit any further cache fills from loading data or validating data for this cache block. 

Also note that an assertion of C%CBOX^HARD_EKR_H during a cache fill command causes the cache 
fill operation to be processed as if it were an INVAL operation. 
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12.3.10 CACHE FILL COMMANDS 

See Section 12.3.5.1 for a discussion of cache fill operations. 

1 2.3.1 1 MME CHECK COMMANDS 

Two commands exist for the purpose of checking references for possible memory management 
exceptions. 

12.3.11.1 MME_CHK 

The function of the MME_CHK command is to obtain the allowed access rights for a specified 
page, and to compare it against an intended access mode specified by M_QUE%S5_AT_H< 1K)> . The 
MME_CHK command causes a TB access of the PTE corresponding to the MME_CHK address. 
If the PTE is not cached in the TB, the Mbox first fetches the PTE from memory. Once the PTE 
information is accessed, ACV/TNV/M=0 checks are performed. If an ACV, TNV or M=0 fault is 
detected, the appropriate memory management fault response is invoked (See Section 12.5.1.5.3 
for a description of ACV/TNV/M=0 faults). 

12.3.11.2 PROBE 

The PROBE command is used when the microcode must determine the accessibility of a page 
before changing any state (e.g. PROBER, PROBEW, CHMx macro instructions). It functions 
exactly as an MME_CHK command except for three differences: 

• If an ACV, TNV, or M=0 condition is detected, no ACV, TNV, or M=0 response is invoked. That 
is, a PROBE merely detects the condition without actually causing a memory management 
exception. The PROBE command will update MMESTS based on the probe information if 
MMESTS is unlocked. However, a PROBE command will never lock MMESTS. 

• The PROBE command returns status to the Ebox which indicates the nature of any memory 
management condition the PROBE may have detected. 

• If M_QUE%S5_AT_H<1H)>=00 corresponding to the 
PROBE reference, then the MME_DATAPATH tb_miss sequence is not invoked when the 
TB detects a miss. 

Status is returned to the Ebox on the M%MD_BUS_H in the following format: 

• M%MD_BUS_H<3> is set when the PROBE reference hits in the TB. 

• M%MD_BUS_H<2> is set when the PROBE reference corresponds to an ACV fault. 

• M%MD_BUS_H<1> is set when the PROBE reference corresponds to an TNV fault. 

• M%MD_BUS_H<0> is set when the PROBE reference corresponds to an M=0 fault. 

• All other M%MD_BUS_H bits are undefined. 

NOTE 

One exception to this PROBE status format exists. When M9£MD_BUS_H<2:0> » Oil, 
the meaning of this code indicates that a TNV has occurred on the PPTE (Process Page 
Table Entry) corresponding to the PROBE address. It does NOT mean that a TNV and 
M=0 fault have simultaneously occurred on the PROBE address (this would not make 
sense). 
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The following tables summarizes all possible PROBE status encodings. 
Table 12-12: Probe Status Encodings 

M%MD_BUs_H<3ifi&QUR%65_AT_ J H< 1 :OProb e Status 



X000 at A =00 No fault 

X001 at A =00 Modify fault 

X010 at A =00 TNV fault. 

X011 at A =00 TNV fault on PPTE reference. 

X100 at A =00 ACV fault 

X101 at A =O0 illegal status (will never be generated) 

X110 at A =00 illegal status (will never be generated) 

Xlll at A =00 illegal status (will never be generated) 

OXXX at=00 PROBE missed in TB. Lower three bits are a don't care. 

1XXX at=00 PROBE hit in TB. Lower three bits are a don't care. 



If memory management is turned off (i.e. MAPEN=0) execution of the PROBE command returns 
a status of M%MD_BUS_H<2:0>=0 indicating that no fault was detected (M9lMD_BUS_H<3> will vary 
based on hit/miss TB status). 

12.3.12 TB Fills 
12.3.12.1 TB Tag Fills 

The TB_TAG_FILL command is used in conjunction with the TB_PTE_FILL command to cache 
a PTE in the TB. The data associated with the TB_TAG_FILL command corresponds to a virtual 
byte address in some virtual page. The TB_TAG_FILL command causes the page address on 
M_QUE%S5_VA_H<31:9> of the TB_TAG_FILL data to be written into the tag field of the TB entry 
pointed to by the NLU TB allocation pointer (see Section 12.5.1.3 for information about the NLU 
TB allocation pointer). The TB valid bit (TBV) of the entry is cleared. 

When TB_TAG_FILLs occur from the MMEJLATCH, the tag data is driven onto M_QUE%S5_VA_H 
in the following format: 
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Figure 12-38: TBJTAG_FILL Format (from MME LATCH) 



31 30 29 28|27 26 25 24|23 22 21 20119 18 17 16 | 15 14 13 12|11 10 09 08|07 06 05 04 | 03 02 01 00 



+ — + — + — + — + — +— + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+ — +--+ — +~+ — + — + — + — + — + 
I VPN | 0| 0| 0| 0| 0| 0| 0| 0| 0| 

+ — +—+--+ — + — +— +— + — +— + — + — + — + — + — + — +— + — +—+—+—+ — + — +__+__+—+ — +— + — +__+ — + — + — + 



Table 1 2-1 3: TB_TAG_FILL Definition 

Name Extent Type Description 

VPN 31:9 W Virtual page address used to fill a TB tag field. 



During the TB_TAG_FILL, the TB logic will automatically generate even tag parity corresponding 
to PTE<31:9>. This parity will be written into the TB during the TB_TAG_FILL operation. 

When TB JTAG_FILLs occur from the Ebox, the tag data is supplied from the address field of the 
EM_LATCH and is driven onto MLQUE%S5_VAJH in the following format: 

Figure 12-39: TB_TAG_FILL Format (from EM_LATCH): IPR 7E (hex), MTBTAG 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 

+ + + + + + — + — + + H + + + + + + — + + + + + H + — + + — + + + + + + + + 

| VPN | 0| 0| 0| 0| 0| 0| 0| 0|TP| :MTBTA 









Table 12-14: 


MTBTAG Field Descriptions 




Name 


Extent Type Description 




TP 
VPN 


0 W Even tag parity bit. 

31:9 W Virtual page address used to fill a TB tag field. 





In this case, the even tag parity corresponding to the VPN is specified in bit<0> of the data 
field for the TB_TAG_FILL . This mechanism allows correct or incorrect parity to be deliberately 
written into the TB tag array for testability purposes by invoking the TB_TAG_FILL operation 
through the appropriate MTPR instruction. 



12.3.12.2 TB PTE nils 

The TB_PTE_FILL operation drives the PTE data onto M_QUE%S5__VA t _H<3l:0> in order that this 
data can be written into the data array of the TB. The data is written into the entry pointed 
to by the NLU TB allocation pointer. The TB valid bit (TBV bit) of the entry is set (Note 
that a TB_TAG_FILL command will not be issued by the Mbox if PTE<31> is clear in order to 
guarantee that only validated PTEs are ever cached in the TB). The NLU TB allocation pointer 
is incremented after the fill is done. 
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When TB_PTE_FILLs occur from the MME.LATCH, the PTE data is driven onto M_QUE%S5_VA^H 
during a TB_PTE_FILL in the following format: 

Figure 12-40: TB_PTE__FILL Data Format (from MME LATCH) 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08 | 07 06 05 04 | 03 02 01 00 



+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| 1| PROT | M| 0| 0| 0| PFN | 

+ + + + 1- + + + + 1- + + + + H + + + + + + 1- y + + + + + + + + + + 



Tabie 1 2-1 5: TB_PTE_FILL Definition 

Name Extent Type Description 

PFN 22:0 W Page frame address 

0 25:23 Forced to 0 by MME_LATCH 
M 26 W PTE modify bit 

PROT 30:27 W PTE protection field. 

1 31 Valid bit of PTE (must be a "1". See below) 



Only bits <30:26>, <22:0> and the corresponding PTE parity bit are actually written into the 
TB array during a TB_PTE_FILL. TB_PTE_FILLs from the MMEJLATCH will only be issued 
for validated PTEs. Therefore, PTE<31> will always be set. The TB logic will automatically 
generate even parity to be written during the fill corresponding to PTE<31:0>. Note that the 
parity generator includes PTE<31> in this calculation even though this bit is not written into the 
TB. Since PTE<31> is always a "1" during a TB_PTE_FILL, the stored parity can be thought of 
as odd parity on bits <30:0>. 

When TB_PTE_FILLs occur from the EM_LATCH, the PTE data is driven onto M_QUE%S5_VAja 
during a TB_PTE_FILL in the following format: 
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Figure 12-41 : TB_PTE_FILL Data Format (from EMJLATCH): IPR 7F (hex), MTBPTE 

31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16115 14 13 12111 10 09 08|O7 06 05 04|03 02 01 00 

+ — +— H + — + — + — + — + — +— + — + — + + — + — + — +~+ — + — + — + — + — + — + — + — + — + — +— + — +~+ — + — +--+ 

| 1| PROT | M| 0| P| 0| PFN | :MTBP1 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

Table 12-16: MTBPTE Field Descriptions 



Name 


Extent 


Type 


Description 


PFN 


22:0 


W 


Page frame address 


0 


23 




Assumed to be a "0" for parity calculation. 


P 


24 


W 


User-settable even parity corresponding to PTE<31:26> and 
PTE<22:0>. 


0 


25 




Assumed to be a "0" for parity calculation. 


M 


26 


w 


PTE modify bit 


PROT 


30:27 


w 


PTE protection field. 


1 


31 




Assumed to be a "1" for parity calculation. (See below) 



Bits <30:26>, <22:0> are written into the TB array during a TB_PTE_FILL. Bit<24> is interpreted 
as the corresponding PTE parity and is directly written into the TB as such. This gives the user 
the flexibility of writing correct or incorrect PTE parity for testability purposes. Note however 
that while PTE<31> is not written into the TB, it must be assumed that this bit is set when the 
user calculates even parity on PTE<31:0>. Similarly, PTE<25> and PTE<23> must be cleared 
for proper parity calculation. 

See Section 12.5.1.5.2 for a description of TB fill sequences. 



12.3.13 TBIS 



The TBIS (TB Invalidate Single) command invalidates the PTE entry corresponding to the 
specified virtual address, providing that the PTE is cached in the TB. If the PTE is not cached in 
the TB, no action is taken. 



12.3.14 TBIP 

The TBIP (TB Invalidate Process) command invalidates all the PTE entries corresponding to 
P0 or PI space translations which are currently cached in the TB. This command is used when 
the CPU changes process context. It allows a new process translation state to be set up for the 
new process context without being polluted by old translations corresponding to the old process 
context. TBIP does not invalidate PTEs corresponding to system space translations because these 
translations are valid across all processes. 
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12.3.15 TBI A 

The TBIA (TB Invalidate All) command invalidates all PTE entries in the TB and resets the NLU 
TB allocation pointer to a known state. This is done for CPU initialization purposes, when the 
operating system reconfigures its system space translations, and when the Mbox clears the TB 
after encountering a TB parity error. 

12.3.16 STOP_SPEC_Q 

The STOP_SPEC_Q command is sent by the Ibox to inform the Mbox that no subsequent Ibox 
specifier references should be processed until the Ebox sends the proper synchronization. This 
command decrements the SPEC_Q_SYNC_CTR. In all other respects, it is treated as a NOP by 
the Mbox. See Section 12.3.20 to understand the context of the use of STOP_SPEC_Q. 

12.3.17 UNALIGNED REFERENCES 

An unaligned reference is a D-stream memory read or memory write reference that refers to 
data which crosses a quadword-aligned boundary (note that unaligned I/O space references 
are defined to cause UNPREDICTABLE behavior). A quadword boundary is the appropriate 
address resolution because the Pcache and Cbox read and write aligned quadwords of data. If a 
reference crosses a quadword-aligned boundary, the unaligned reference must be translated into 
two references— one for each distinct quadword memory access. 

Detection of an unaligned reference is done in So by the unaligned detection logic and is a function 
ofM_QUE%S5_VA_H<2:0> and M_QUE%S5_DL_H<1:0> of the S5 reference packet. The following table 



summarizes all possible 


unaligned configurations: 


DL 


ADDK<2K)> 


word 


111 


longword 


101, 110, 111 


quadword 


101, 110, 111 



When an unaligned D-stream read, STORE or WRITE is detected, the Mbox does the following: 

• The address of the unaligned reference is used to reference the aligned quadword 
corresponding to the lower portion of the data. 

• The Mbox generates a second reference corresponding to the aligned quadword corresponding 
to the upper portion of the reference. 

• In the case of reads, once both references have been executed, the requested data is extracted 
from the two quadwords and aligned to M9£MD_BUS_H<31:0>. 

The implication of unaligned processing by the Mbox is that unaligned references are functionally 
invisible to the Ibox and Ebox. That is, the Ibox and Ebox can perform reads and writes without 
regard to alignment. Note that Mbox-generated references and I-stream reads are always aligned 
references. 
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12.3.17.1 Unaligned Reads 

When an S5 read is determined to be unaligned, the S5 command packet is loaded into the 
VAP_LATCH. However, M_QUE%S5_VA..H<31:0> is not directly loaded. Instead the quadword 
incrementor associated with the YAP_LATCH increments the M_QUE%S5_VAja quadword address. 
This new address is loaded and is used to reference the upper half of the unaligned data. 

Meanwhile, the current S5 read command is allowed to execute. When this read successfully 
completes in S5, the VAP_LATCH is validated indicating that it contains the upper half of 
the unaligned reference and that it can now be executed. Subsequently, the VAP_LATCH 
reference will be processed in the S5 pipe. Once it successfully completes in S5, the VAP_LATCH 
is invalidated. Note that if the read originated from the EM.LATCH, the EM_LATCH was 
invalidated as the first reference of the unaligned pair successfully completed. However, if the 
read came from the SPEC.QUEUE, the SPEC_QUEUE is not invalidated until the VAP_LATCH 
reference successfully completes (See Section 12.3.19.1). 

When data for the first read is available on B%S6_DATA W H<63:0> (either from the Pcache or the 
Cbox), the data is rotated by the MD_BUS_ROTATOR based on M%S6_PA W H<2:0> and latched 
in the MD_BUS_ROTATOR latches. Since the VAPJLATCH read was executed after the initial 
read, its data is guaranteed to be available during some cycle after the initial data is latched by 
the MD_BUS_ROTATOR. When the second data arrives in S6, the data is rotated by the same 
number of bytes as was done for the first reference. The lower one, two, or three bytes of the 
M9cMD_BUS_H is then driven from the MD_BUS_ROTATOR latches which contain valid data from 
the first reference while the remaining bytes of M%MDJBUS_H are driven directly from the rotator. 
The effect of this sequence is to assemble the data from the two reads in a right-justified manner 
on the M%MD_BUSJE. When the assembled data is driven, M%JBOXJDATAJL and/or M<rcEBOX_DATA_H 
are asserted to indicate the destination of the data. 

The RTY_DMISS_LATCH always contains a physical address because it stores retried reads from 
the S6 pipe. The implication of this fact on unaligned reads is that an unaligned sequence is never 
initiated from the RTYJDMISS_.LA.TCH because the RTY_DMISS_LATCH address is physical. 
If an unaligned reference crosses a page boundary, the physical address of the second reference 
is not guaranteed to be a quadword incremented version of the first reference since the first and 
second references are associated with different address translations. 

1 2.3.1 7.2 Unaligned Writes 

like unaligned reads, unaligned writes are processed by breaking the reference into two aligned 
quadword references such that the VAPJLATCH always generates and stores the upper portion. 
When this EM_LATCH command successfully completes in S5, the YAP_LATCH generates the 
upper portion of the unaligned write reference in the same manner as an unaligned read. The 
data driven on M_QUE%S5_DATAJB<31:0> from the EMJLATCH byte rotator during the first write 
is latched in the VAJP_LATCH. Thus, when the VAP_LATCH write executes, the same data is 
again driven onto M_QUE%S5_DATA W H<31:0>. It is the different byte masks and addresses of the 
two aligned writes which cause the proper bytes to be written into the proper bytes of memory. 
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12.3.17.3 Byte Mask Generation for Unaligned Writes 

The byte mask generator must understand whether a given reference is the first or second 
reference of an unaligned reference pair in order to generate the appropriate byte mask. 
M_QUE%S5_QUAL_H<3> is used to determine this. 

The following table illustrates examples of the behavior of the byte mask generator for aligned 
and unaligned writes: 



Table 12-17: 


Byte Mask Logic for Aligned and Unaligned References 








BM 


BM 


BM 




ref 




/TIT _V»*r+-.\ 
DVT© ) 


/TIT _nr amI \ 


yUl^BslOlXg) 


BM (DL-quad) 


1st 


000 


00000001 


00000011 


00001111 


00001111 


2nd 


000 


— 


— 


— 




1st 


001 


00000010 


00000110 


00011110 


00011110 


2nd 


001 


— 


— 


— 




1st 


010 


00000100 


00001100 


00111100 


00111100 


2nd 


010 


— 


— 


— 




1st 


011 


00001000 

WW WW X WW 


00011000 

WW J»A www 


01111000 


01111000 




011 










1st 


100 


00010000 


00110000 


11110000 


11110000 


2nd 


100 










1st 


101 


00100000 


01100000 


11100000 


11100000 


2nd 


101 






00000001 


00000001 


1st 


110 


01000000 


11000000 


11000000 


11000000 


2nd 


110 






00000011 


00000011 


1st 


111 


10000000 


10000000 


10000000 


10000000 


2nd 


111 




00000001 


00000111 


00000111 



Since the VAPJLATCH always increments the virtual address by eight, the lower three bits of 
the VAP_LATCH address will always be the same as the original address. However, the lower 
three bits of the address sent to the Cbox (M%C_S6_PA<2K)>) are always zeroed on the second half 
of an unaligned reference in order that the address that is sent off chip is consistent with the 
corresponding byte mask value. 
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12.3.17.4 Unaligned Destination Specifier Writes 

When an unaligned DEST_ADDR or unaligned DREAD.M ODIFY command is latched in the 
SPEC_QUEUE, the unaligned detection logic flags the unaligned condition and thus, the reference 
is split into two aligned references by the mechanism described previously. As each one of the 
pair of commands executes, one entry will be added to the PA_QUEUE. 

When the corresponding data arrives in the EM_LATCH via the STORE command, the data 
is rotated based on the lower two address bits output from the PA_QUEUE. The rotated 
data is then matched up with the reference driven from the PA_QUEUE to form a newly 
assembled WRITE command. Since the reference driven from the PA_QUEUE indicates 
that M_QUE%S5_QUAL_H<4>= 1 (i.e. this reference is the first part of an unaligned pair), the 
VAP_LATCH latches and validates a copy of the STORE command with the rotated STORE 
data. When this newly assembled WRITE command successfully completes in S5, the bottom 
entry of the PA_QUEUE is retired. When the VAP_LATCH subsequently executes the second 
STORE reference, the second entry in the PA_QUEUE is matched with it and retired. In effect, 
the STORE data is split into two STORE commands so that each STORE is merged with each 
PA_QUEUE entry to form two WRITE commands. 

12.3.17.5 implication of Ebox unaligned references on m%em_lat_rjll_h 

The EM_LATCH is invalidated whenever the EM_LATCH reference successfully completes in So. 
However, if the EMJLATCH reference was unaligned, the second half of the reference still awaits 
processing in the VAP_LATCH even though the EM_LATCH has been invalidated. Clearing the 
EMJLATCH while the second half of an unaligned Ebox reference is still pending could release 
the EM_STALL condition causing the Ebox microcode to advance even though the Mbox has not 
completed processing of the second part of the previous unaligned reference. 

This scenario is undesireable since the Ebox microcode makes synchronization assumptions based 
on references being retired from the EM_LATCH. Ib preserve these assumptions, the Mbox 
will assert M%EM_LAT_FULL_H until both halves of the unaligned reference have been retired 
even though the EM_LATCH will have been invalidated earlier. Note that this applies to both 
unaligned reads and unaligned writes. 

12.3.18 ABORTING REFERENCES 

The Mbox abort operation is used to cancel the current S5 operation. When an abort is executed, 
the S5 state, which would normally be updated due to execution of the current S5 reference, is not 
updated. The aborted S5 reference is not propagated into S6. Instead, a NOP is introduced into 
the S6 pipe. In effect, an aborted S5 reference is equivalent to a NOP command being executed 
in S5. 

Note that the abort operation should be viewed as only cancelling the current execution of 
a reference. In most cases, aborting an operation does not invalidate the existence of the 
corresponding reference, which will stall be stored in one of the reference sources and retried 
at a later point. 

The abort operation is executed when M_S5C _ABT%ABORT_L is asserted. The following changes to 
Mbox state are inhibited during the cycle in which M_S5C_ABT%ABORT_L is asserted: 

• The reference source which drove the aborted command into S5 does not invalidate the 
corresponding command. Thus, the reference still exists to be retried during a subsequent 
cycle. 
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NOTE 

There are two exceptions to this rule. The CB OXJLATCH is always invalidated 
after it drives a command into S5. The EM_LATCH will be invalidated if the Ebox 
has explicitly requested it to be (via the E%EM_ABORT_L signal). 

• Loading the PA_QUEUE with a DEST_ADDR or DREAD_MODIFY command is inhibited. 
Emptying the PA_QUEUE when a STORE command is driven in S5 is inhibited. 

• If the unaligned detection logic detected an unaligned reference during the aborted cycle, the 
VAP_LATCH is not validated to contain the second portion of the unaligned sequence. 

12.3.18.1 Conditions for Aborting References 

In general, references are aborted for five reasons: 

• The reference is aborted to prevent a reference order restriction from occurring (see 
Section 12.3.18.1.1). 

• The reference is aborted because insufficient hardware resources are available to complete 
processing of the current command. 

• The reference is aborted because a memory management operation must be performed prior 
to execution of the current reference. 

• The reference is aborted in order to avoid a deadlock condition related to unaligned references. 

• The reference is aborted due to an external flush condition. 

The following describes the specific conditions which can invoke an abort operation for each of 
the five categories listed above. 

12.3.18.1.1 Aborting to Maintain Reference Order Restrictions 

• Aborting D-stream hits under D- stream misses: Consider the case where two D-stream reads 
are executed in back-to-back cycles. In this case, the second D-stream read will be aborted 
in S5 if the first D-stream read misses in the Pcache in S6. This prevents the possibility of 
propagating the second read into S6 and having it hit and return data before the first read 
returns data. 

Note that this condition applies to all D-stream "read-like'' references (i.e. references which 
return data to the Ebox). Specifically, this condition applies to DREAD, DREAD_MODIFY, 
DREAD_LOCK, IPR_RD, and PROBE commands. 

• Aborting I-stream hits under I-stream misses: The Mbox initiates an IREAD sequence 
by issuing consecutive IREAD commands via the I-stream "fill forward" mode (See 
Section 12.3.5.2.1). If the first IREAD in this sequence misses in the Pcache in S6 while 
the second IREAD is executing in S5, the second IREAD is aborted. This is done to handle 
I-stream reads in an analogous fashion to D-stream reads. 

• Aborting to preserve order of Ibox reads relative to Ebox writes: As explained previously, the 
PA_QUEUE is the structure used to store pending destination specifier addresses until the 
Ebox can supply the corresponding data to complete the write reference. Once the Ebox 
supplies the data, the write executes and the corresponding entry in the PA_QUEUE is 
invalidated. 
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The comparator function built into the PA_QUEUE is used to detect address matches on 
bits<8:3> between Ibox D-stream read references and any of the valid PA_QUEUE entries. 

Consider the example shown in Figure 12-14. 

In this example, the Ibox would decode the destination specifier of the first MOVL instruction 
which causes a DEST_ADDR command to be sent to the PA_QUEUE. Subsequently, the Ibox 
would decode the first specifier of the second MOVL, causing a read to be issued to the Mbox. 
When this read is started in the S5 pipe, a PA_QUEUE comparator will detect an address 
conflict between the read and the pending destination address. As a result, the read is aborted 
and is not successfully executed until the write completes. Thus, all reads originating from 
the SPEC_QUEUE are aborted if the PA_QUEUE detects an address conflict. 

Note that the PA.QUEUE must always detect physical address conflicts. Detecting virtual 
address conflicts is not sufficient since two or more different virtual pages could he mapped 
to the same physical page causing two or more different virtual addresses to conflict on the 
same physical longword. 

Also note that the PA_QUEUE is capable of detecting false conflicts because only address 
bits <8:3> are compared rather than the entire address. Performance data indicates that 
the number of false conflicts using addr<8:3> is sufficiently low to have an insignificant 
performance degradation. Bits <8:3> are used since they are untranslated address hits and, 
therefore, are immediately available for use without waiting for the address to be translated. 
The lower three bits are not used because the PA_QUEUE must detect conflicts at quadword 
resolution. The following diagram illustrates why quadword resolution must be used: 

Figure 12-42: PA.QUEUE conflict detection 



< — — memory aligned quadword — — — - — > 

I I 
I , | | 1 | 1 | | 

I < ?A_QUSUE entry addresses this longword— > | 

< + > PA_QUEUE addr<2:0>: 010 

A DREAD is issued which I DREAD addr<2:0>: 101 

addresses this byte —————-——+ 



The diagram above illustrates eight bytes of memory within a memory aligned quadword. In 
this example, the PA_QUEUE contains a destination address which references a longword. 
While this reference is not longword aligned, it is handled as an aligned reference because the 
reference does not cross an aligned quadword boundary. Consider the byte DREAD shown 
above which is issued by the SPEC_QUEUE and is executed in S5 in the presence of the 
PA_QUEUE entry. While a PA_QUEUE address conflict clearly exists on the fifth byte within 
this quadword, the lower three bits of the PA_QUEUE address do not match the lower three 
bits of the DREAD address. Thus, the the lower three bits cannot be used for the purposes 
of PA_QUEUE conflict detection. 

DREAD_MODIFY references with DL=quadword pose a special problem for the PA_QUEUE 
conflict logic. Quadword memory operands are requested by the Ibox by issuing a D-stream 
reference with DL=quadword followed by another D-stream reference with DL=longword. 
The first reference causes the lower half of the quadword operand to be returned on 
M%MD_BUS_H<31:0> (i.e. all quadword DREADs only return a longword of data). The 
second reference addresses the upper half of the quadword causing the upper half of the 
operand to be returned on M%MD_BUS_H<31:0>. If the quadword operand is aligned, both 
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the quadword and the longword references have the same quadword address. Thus, when 
the DREADJVIODIFY longword reference is executed in S5, a PA_QUEUE address conflict 
could be detected against the DREAD_MODIFY quadword reference previously loaded. If this 
were to happen, a deadlock state would exist within the NVAX chip because the corresponding 
STORE data for the quadword operand cannot be generated to clear the PA_QUEUE until 
the Ebox receives the entire requested quadword operand, which cannot happen as long as 
a PA_QUEUE address conflict is detected. A similar deadlock situation could result from an 
unaligned DREAD_MODIFY quadword operand. 

Tb avoid this deadlock problem, the PA_QUEUE control logic stores a state bit for each entry 
to indicate whether the DL is quadword. If the last entry loaded contains a quadword, the 
PA_QUEUE address conflict logic associated with that PA_QUEUE entry is inhibited. This 
avoids deadlock by preventing the PA_QUEUE conflict logic from detecting a conflict between 
the first half and the second half of the same DREAD_MODIFY quadword specifier. 

• I/O space reads prefetched by the Ibox which are destined for the Ebox must be inhibited 
until the Ebox is stalling on that particular I/O space read: Since certain I/O devices can 
cause their state to change based on a read reference to that device, the possibility exists 
for I/O device state to be improperly modified based on Ibox prefetching of operands. We 
must guarantee that any state change only occurs within the context of Ebox execution of the 
corresponding instruction. 

Thus, I/O space reads are aborted in S5 until we can guarantee that the Ebox is executing 
the instruction corresponding to the I/O space read. This function is implemented by aborting 
any I/O space read originating from the SPEC_QUEUE which returns data to the Ebox when 
either of the following two conditions is true: 

1. E%START_IBOX_IO_RD_H is deasserted. E%STAE.T_EBOX w _IO_RD_H is an Ebox signal that 
informs the Mbox that the S3 Ebox pipe is currently in MD_STALL waiting for an 
operand to be returned. Thus, the deassertion of this signal indicates that the Ebox 
cannot currently be stalling on the I/O space operand. 

2. A NOP command does not currently exist in the S6 pipe. This condition is necessary to 
account for a timing boundary condition which can exist between the Mbox and Ebox. 
It is possible for the Ebox to be MD_STALIing on an S6 reference corresponding to a 
previous instruction when the I/O read is in S5. In this case, E%STAJRT_ffi03LIO_RDJB 
could be asserted in reference to the previous MD data which may exist in the S6 pipe 
while the I/O space reference exists in the S5 pipe. Ib avoid this potential problem, the 
I/O space reference is aborted until a NOP is detected in S6 which indicates that this 
boundary condition cannot exist. 

Note that it is necessary to stipulate that this abort condition only affect Ibox I/O space 
DREAD references which directly return data to the Ebox. This is because it is conceivable 
that a deferred mode destination specifier could cause the DREAD of the address of the 
operand to map to I/O space. In this situation, the Ebox will never MD_STALL on 
this reference since it corresponds to a destination specifier. Thus, the pipeline could 
hang if the Mbox unconditionally aborted all Ibox I/O space DREADSs. By conditioning 
M_QUE%35_DEST_H into this abort equation, this deadlock condition is avoided by only 
applying this abort condition to DREADs which return data to the Ebox 

* Aborting reads to the same Pcache index as a pending read/nil operation: As stated in 
Section 12.2.13, allowing two Pcache fill sequences to simulataneously operate on the same 
Pcache block creates the possibility of corrupting this Pcache block, lb prevent this, address 
bits <8:5> of the DMISS_LATCH are compared against M_QUE%S5_PAJB<8:5> when S5 
contains an IREAD and the DMISSJLATCH is validated. If there is a match, the S5 IREAD 
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is aborted in order that a potential I-stream nil sequence does not pollute the Pcache block 
associated with the D-stream nil already in progress. 

Note that address bits<8:5> are used to detect a Pcache index address conflict even though 
bits<ll:5> represents the entire Pcache index. The upper three bits of the Pcache index are 
not used because these can be translated address bits which are not available in time for 
the address comparator circuit. By only using bits <8:5>, some false address conflicts may 
occur. A false address conflict will needlessly delay processing of a read or write reference, 
however, the NVAX performance model has shown that this has a negligible impact on overall 
performance. 

Even if a true Pcache index conflict is detected, it is possible that there is no block conflict 
because the 2-way set associative Pcache contains two blocks per index. In order to reduce 
hardware complexity however, a block conflict is assumed to have occurred whenever an index 
conflict is detected even though the references may address different blocks within the index. 

By the same rationale, the same address bits of a valid IMISS_LATCH are compared against 
M_QUE%S5_PA_H < 8 : 5 > when S5 contains a D-stream read. If a match is found, the S5 read is 
aborted in order to let the I-stream fill proceed without possible corruption. 

• Aborting writes or STORESs to the same Pcache index as a pending read/fill operation: As 
stated in Section 12.3.18.1.1, writes should be inhibited from executing if they map to the 
same Pcache block as a Pcache fill already in progress. Otherwise, the memory write data 
could miss in the Pcache block during a fill sequence before the Cbox supplied the fill data. 
When this subblock is filled by the Cbox, this Pcache subblock would be validated with old 
data. Therefore, the write data which was processed by the Mbox would not be reflected in 
the Pcache. 

Avoiding this situation is accomplished by the comparators built into the DMISS_LATCH and 
IMISS_LATCH. If either of these latches are valid, and bits <8:5> of the fill address equals 
M_QUE%S5_RA W H<8:5> of an S5 write or S5 STORE, then the S5 write is aborted. Note that 
since the entire write address is not compared, we may abort writes when there was not 
a true address conflict. This is done however, for circuit speed reasons and does affect the 
overall CPU performance appreciably. 

12.3.18.1.2 Aborting due to lack of hardware resources 

• Aborting a "read-like" reference when the RTY_DMISS_LATCH is mil: Consider the situation 
where a D-stream fill is executing and the RTY_DMISS_LATCH stores the next read to be 
executed. If a third read is started in S5, it is automatically aborted. If the third read were 
not aborted two incorrect scenarios would result. The third read could miss in S6 with no 
where to put it, since both the DMISSJLATCH and the RTY_DMISS_LATCH are full. If 
the third read hit, its data would be returned before the data of the second read, which is 
equivalent to an illegal "hit under miss" scenario. 

For the purposes of the above discussion, a "read-like" reference is defined as any reference 
which returns data to the Ebox. Thus, a read-like reference is a DREAD, DREAD_MODIFY, 
DREADJLOCK, IPR_RD, or PROBE command. 

• Aborting DEST_ADDR or DREAD_MODIFY due to insufficient room in PA_QUEUE: If a 
destination specifier reference is executing in S5, but there are insufficient PA_QUEUE 
entries to store the reference, the Mbox has no choice but to abort the S5 reference and 
retry it later when more PA_QUEUE entries free up. If the S5 reference is unaligned, the 
abort logic tests for two empty slots in the PA_QUEUE since two will be required for the 
unaligned reference. If the S5 reference is aligned, only one slot need be available. 
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• Aborting an S5 write, STORE or Cbox IPR_WR due to Cbox back-pressure: All S6 Cbox writes 
are automatically transferred to a write buffer in the Cbox. The Cbox uses this write buffer to 
store the writes until they can be written into the Cbox, B cache or main memory. If this write 
buffer becomes sufficiently full so that we cannot guarantee that the S5 write or STORE can 
be loaded into the write buffer when it propagates to S6, the S5 command is aborted. The 
Cbox asserts write buffer back-pressure to the Mbox by asserting C%WR_BUF_BACK W PRES_H. 

12.3.18.1.3 Aborting due to memory management operation 

When a tb_miss or cross-page condition is detected, a memory management operation must be 
processed before the S5 reference can be allowed to complete. Thus, detection of a tb_miss 
or cross-page condition causes the S5 command to be aborted until the memory management 
operation finishes. This also prevents the possibility of having to handle a second memory 
management sequence before the first memory management sequence completes. 

The two specific abort conditions are: 

• Aborting an S5 reference due to TB.MISS condition: If the virtual address of the S5 reference 
is not found in the TB, the corresponding physical address cannot be immediately derived. 
Therefore, the reference is aborted until the translation can be cached in the TB (See 
Section 12.5.1.5.2 for information on memory management). 

• Aborting an So reference due to CROSS_PAGE condition: If an unaligned S5 reference 
references two pages, a CROSS_PAGE condition has been detected. In this situation, access 
checks of both pages must be made before the reference is allowed to complete. Therefore, 
the reference is aborted and retried after the CROSS_PAGE check has tested the upper page 
(See Section 12.5.1.5.4). 

In either situation described above, all but two reference types from the Ibox or Ebox references 
will be continually aborted until the memory management sequence completes. The two 
exceptions are the STOP_SPEC_Q and STORE commands. Since these references are guaranteed 
not to require any memory management function, these references are allowed to proceed. Note 
that while a STOP__SPEC_Q command is never aborted, it is transformed into a NOP command 
as it enters the S6 pipe. This is allowable since no S6 function is performed by this command 
and it offers an extra S6 data bypass opportunity. 

12.3.18.1.4 Aborting due to an external flush condition 

This abort condition will be explained in the discussion of flushes. 

1 2.3.1 9 MBOX PIPELINE DEADLOCK AVOIDANCE SCENARIOS 

Two special considerations have been designed into the Mbox in order to avoid two possible 
pipeline deadlock conditions. 
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12.3.19.1 Unaligned Reference Deadlock Condition 

Consider the situation where the second part of an unaligned D-stream read is driven into S5 
from the VAP_LATCH. If this read conflicts with the quadword address of a valid PA_QUEUE 
entry, this read will be aborted based on PA_QUEUE address conflict detection. 

If the VAP_LATCH is not cleared, a pipeline deadlock situation has occurred because the 
VAP_LATCH command will always execute before an EM_LATCH command. However, a STORE 
command originating from the EM_LATCH is the only way the PA_QUEUE conflict can be 
ehminated. Therefore, in addition to aborting the VAP_LATCH reference during a PA_QUEUE 
conflict, the VAP_LATCH must be invahdated in order that the arbitration logic can select the 
EM_LATCH STORE command to clear the PA.QUEUE conflict condition. 

Clearing the YAP_LATCH due to PA_QUEUE conflict detection has several implications. It means 
that the unaligned sequence must be restarted from the beginning in order to re-generate the 
\AP_LATCH reference. This is why the corresponding SPEC_QUEUE entry is not invahdated 
until the entire unaligned sequence successfully completes in S5. A side effect of this is that the 
first read of the unaligned sequence will be re-executed causing two read references to the same 
data. This, however, is harmless if the read is to memory. This may not be harmless if the read is 
to I/O space, however, unaligned I/O space reads are denned to yield UNPREDICTABLE results. 

Another implication of avoiding this pipeline deadlock is that the bottom entry of the PA_QUEUE 
must be invahdated if the VAPJLATCH command was a DREAD_MODIFY command. If it was 
a DREAD_MODIFY. the first reference of the unaligned pair had already introduced an entry 
into the PA_ QUEUE. Since the first reference will be re-executed, the corresponding PA_QUEUE 
entry is invahdated to avoid replicating the same PA_QUEUE entry twice. 

12.3.1 9.2 READJ.OCK/WRlTE_UNLOCK Deadlock Condition 

Once a READ.LOCK command has been passed to the Cbox, the Cbox will not process any 
subsequent D-stream read references until the corresponding WRITE_UNLOCK command has 
been executed. This behavior introduces a deadlock consideration. 

Consider the situation where a DREAD_LOCK has been sent to the Cbox. Before the EM_LATCH 
is loaded with the corresponding WRITE.UNLOCK, the Mbox starts processing an IREAD 
reference which misses in the TB. The resulting memory management sequence will issue a 
D-stream PTE read which the Cbox will not process until it has received the WRITE_UNLOCK 
command. However, the Mbox will never send the WRITE_UNLOCK (or any other Ebox or Ibox 
reference) until the memory management sequence completes, which can not occur until the PTE 
DREAD completes. 

This deadlock condition is avoided by the arbitration logic by disabling IREFJLATCH selection 
once a DREAD_LOCK command has successfully been retired from the S5 pipe. Thus, no IREAD 
TB_MISS can occur between the READ_LOCK and WRITEJJNLOCK, thus avoiding the deadlock 
situation. 

The arbitration logic will re-enable IREF_LATCH selection on either of the following two 
conditions: 

1. A WRITE_UNLOCK reference has been retired from the S5 pipe. This will cause the Cbox 
to resume D-stream read processing, thus eliminating the deadlock condition. 
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2. E%FLUSH_MBOX_E is asserted by the Ebox due to a hard error. This condition should 
occur much more infrequently than the above condition because a WRITE_UNLOCK must 
normally be issued after a READ_LOCK. However, if an error occurred sometime between 
the READ_LOCK and WRITE_UNLOCK, a hard error microtrap will result preventing a 
WRITE JJNLOCK from being issued. The microtrap will generate E%FLUSH_MBOX_H which 
re-enables IREF_LATCH selection because no WRITE_UNLOCK will follow. 

Note that the Cbox state, which prevents subsequent D-stream reads from being processed 
before the WRITE JJNLOCK, will be cleared by an IPR.WRITE during the error handler. 

Note that the analogous deadlock condition involving a SPEC_QUEUE reference cannot occur 
because Ibox processing will have been halted prior to the READ_LOCKAVRITE_UNLOCK 
sequence. The analogous deadlock condition involving an EM_LATCH reference will not 
occur because Ebox microcode will never issue a D-stream read in the middle of a 
READ_LOCK/WRITE_UNLOCK sequence. 

12.3.20 THE SPEC_Q_SYNC_CTR 

The PA_QUEUE address comparator function can maintain the relative order of specifier reads 
and destination specifier writes because both the reads and the writes originate from the same 
Ibox pipeline stage and are loaded into the same reference queue. However, when the Ebox issues 
reads or writes independently of the Ibox destination specifier decodes, the PA_QUEUE cannot 
be used since there is no implied ordering between the Ibox reads and Ebox reads or writes from 
two different pipeline stages. In this case, an 8-state counter, called the SPEC_Q_SYNC_CTR, is 
used to prevent Ibox memory operand prefetching when the Ebox can be writing to memory. 

When the Ibox decodes an instruction that can cause explicit Ebox writes which are independent 
of the Ibox destination specifier decodes (e.g. MOVC), the Ibox loads the SPEC_QUEUE with a 
STOP_SPEC_Q command after all specifer references for the same instruction have been loaded. 
Execution of STOP_SPEC_Q in S5 causes the SPEC_Q_SYN C_CTR to be decremented. The 
nominal state of this counter is one. Whenever, the value of SPEC_Q_SYNC_CTR is zero, the 
arbitration logic will not select a SPEC_QUEUE reference as the source for the S5 pipe for 
the next cycle. The effect achieved is to stop all Ibox specifier references from occurring after 
the STOP_SPEC_Q command has executed. When the Ebox completes all explicit writes for 
the instruction which caused the Ibox to issue the STOP_SPEC_Q command, the Ebox asserts 
the E%RESTART_SPEC_QUEUE_H signal. Each assertion of E%RESTART_SPEC_QUEUE_H causes the 
SPEC_Q_SYN C_CTR to be incremented. Subsequent specifier reference processing resumes 
when the value of SPEC_Q_SYNC_CTR is positive. Thus, the SPEC_Q_SYNC_CTR acts as 
a synchronization device to stop processing of specifier references whenever the Ebox may be 
independently modifying memory state. 

Note that a value of zero in the SPEC_Q_SYNC_CTR only prevents the arbitration logic from 
selecting the SPEC_QUEUE as the S5 reference source. It does not prevent the Ibox from loading 
additional references into empty SPEC_QUEUE entries. 

The SPEC_Q_SYNC_CTR is an 8-state unsigned counter which can store values from 0 to 7. 
A counter function must be used for this synchronization function because pipeline behavior 
can cause the Ebox to assert E%RESTART_SPEC_QUEUE_H multiple times before the Mbox ever 
processes any STOP_SPEC_Q commands. For example, if the Mbox is executing a TB_MISS flow 
while the Ebox is retiring multiple instructions associated with this synchronization scheme, 
multiple assertions of E%RESTART_SPEC_QUEUE_H will result even though no STOP_SPEC_Q 
commands have been processed yet due to the on-going memory management sequence. 
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Thus, the SPEC_Q_SYN C_CTR buffers up the E%RESTART_SPEC_QUEUE_H assertions until the 
corresponding STOP_SPEC_Q commands are processed from the SPEC_QUEUE. Note that there 
is no need for the SPEC_Q_SYNC_CTR to buffer up multiple instances of STOP_SPEC_Q because 
the SPEC_QUEUE intrinsically buffers these instances. 

The 8-state SPEC_Q_SYNC_CTR can buffer up to six E%RESTAET_SPEC_QUEUE_H assertions 
(SPEC_Q_SYNC_CTR values 2 through 7). Six buffer states are sufficient to buffer all pending 
instructions which could result in the Ebox assertion of E%RESTAE.T_SPEC_QUEUE_H because at 
most six of these instructions can be issued to the Ebox before the Ibox is back-pressured from 
decoding the next instruction of this type. Six buffered states are derived from the fact that 
the Ibox must fill its four-stage pipeline in addition to the 2-entry SPECLQUEUE before it is 
back-pressured by the SPEC_QUEUE from issuing any further instructions which the Ebox could 
assert E9oEESTART_SPEC_QUEUE_H in response to. 

12.3.21 FLUSHING REFERENCES FROM THE MBOX PIPE 

Flushing the Mbox pipeline refers to altering the state of the Mbox in a controlled way so that 
certain pending and currently executing references are eliminated from the Mbox. There are 
two distinct mechanisms that cause different types of references to be flushed. One type of flush 
originates from the Ibox and the other type from the Ebox. 

12.3.21.1 Ibox Flushes 

If the Ibox VIC is in the process of being filled by a previously requested IREAD, and the 
Ibox has determined, or has been forced, to start decoding instructions at a new point in the 
I- stream requiring another VIC fill, the Ibox asserts the signal, I%FLUSH_IKEF_LAT_H, to the 
Mbox. From the Ibox point of view, assertion of I%FLUSH_IREF_LAT_H indicates that the current 
VIC fill operation will be immediately cancelled. This allows the Ibox to invoke a new VIC fill 
operation via a new IREAD, without having to wait for the current VIC fill operation to complete. 

From the Mbox point of view, assertion of I%FLUSH_IREr_LAT_H aborts all pending and currently 
executing I-stream activity by performing the following actions: 

1. The IREFJLATCH is invalidated. Any IREAD sent to the Mbox during the cycle 
I%FLUSH_EREF_LAT_H is asserted is not validated. 

2. If the current S5 reference is an IREAD or an I_CF, it is aborted. 

3. The IMISSJLATCH is invalidated and all state indicating an outstanding I-stream fill is 
cleared. If the IMISS_LATCH is being loaded during the cycle that I%PLUSH_IREF_LAT_H is 
asserted, the IMISS_LATCH is not validated. 

4. The signal, M%ABORT_CBOSLIRD_H, is asserted to the CBOX to indicate that the Mbox does 
not want any more I_CF references which may have been pending in the Cbox. 

If I%FLUSH_IREF_LAT_H is asserted during a cycle with an outstanding istream read or fill, the 
Mbox logic guarantees that the M%VIC_DAIAJL signal will not be asserted in response to the 
IREAD during any subsequent cycles. However, M%VIC_DAIA k .L may be asserted during the same 
cycle that I%FLUSH_IREF_LAT_H is asserted. It is the responsibility of the Ibox to ignore the 
corresponding data in this case. 
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12.3.21.2 Ebox Flushes 

1 2.3.21 .2.1 Flushing due to E%EM_ABORT_L 

Due to the construction of the microcode, it is possible for the Ebox to issue a reference to the 
Mbox only to discover during the following cycle that the reference should not have been issued. 
In this case, the Ebox asserts E%EM_AB ORT_L during the cycle following when the reference was 
issued. E%EM_ABORT_L causes the Mbox to unconditionally clear the EM_LATCH and to abort 
the S5 reference if that reference was driven from the EM_LATCH. The net effect is to flush out 
all Mbox state associated with this Ebox reference. 

1 2.3.21 .2.2 Flushing due to E%FLUSH_MBOX_H 

When the Ebox determines that a branch misprediction took place, or that process context is to 
be changed, or that an exception or interrrupt has occured, the macropipeline must be flushed in 
order that no processor state changes as a result of subsequent pipeline operations. As part of 
this flush operation, all pending or currently executing references in the Mbox which correspond 
to flushed instructions are immediately and permanently aborted. The Ebox informs the Mbox 
of this situation by asserting EScFLUSHJUBOX^H. 

The assertion of E^FLUSH_MBOX_H invokes the following Mbox actions: 

1. The SPECLQUEUE is invalidated. Any reference sent to the Mbox SPECLQUEUE during 
the cycle in which E < £FLUSH_MBOX».H is asserted is not validated. 

2. The SPECLQ_SYNC_CTR is unconditionally reset to the value of 0. The effect of this is to 
inhibit further SPECLQUEUE reference processing by never selecting the SPECLQUEUE 
as the S5 reference source (See Section 12.3.20). It does not inhibit the Ibox from loading 
references into the SPEC_ QUEUE during subsequent cycles, however. This function is 
associated with the scheme for flushing the PA^QUEUE. See Section 12.3.21.2.3. 

3. If the current S5 reference was driven from the SPEC_QUEUE, it is aborted. 

4. If the EM_LATCH contains any type of read, IPRJRD, probe or MME.CHK, it is invalidated. 
Any reference sent to the EM_1ATCH during the cycle that E%FLUSH_MBOX_H is asserted is 
not validated. 

5. If the current S5 reference was driven from the EM_LATCH, and this reference is any type 
of read, IPRJRD, probe or MME_CHK it is aborted. 

6. If the VAP_LATCH contains any type of read or DEST_ADDR, it is invalidated. If a read or 
DEST.ADDR is being loaded into the VAPJLATCH during the cycle that E^FLUSH.MBOX^H 
is asserted, the VAPJLATCH is not validated. 

7. If the current S5 reference was driven from the VAP_LATCH, and this reference is any type 
of read or DEST_ADDR, it is aborted. 

8. If the RTY_DMISS_LATCH contains any type of an Ibox or Ebox read, it is invalidated. If 
an Ibox or Ebox read is being loaded into the RTY_DMISS_LATCH during the cycle that 
E%FLUSH_MBOXJH is asserted, the RTY_DMISS_LATCH is not validated. 

9. If the current S5 reference was driven from the RTY_DMISS_LATCH, and this reference is 
an Ibox or Ebox read, it is aborted. 

10. If the DMISSJLATCH contains a currently outstanding Ibox or Ebox read, the 
DMISSJLATCH state is modified to indicate that the data should not be sent to the Ibox 
or Ebox when the data becomes available. 

11. MMESTS<31:29> are cleared. This unlocks the MMESTS reg. 
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Hie effect of items 1 through 10 above can be summarized as follows. All Ibox and Ebox D-stream 
reads, which have not yet propagated into S6, are blown away. Note that Mbox D-stream reads 
(PTE references) are not affected by E%FLUSH_MBOX_H. Any outstanding D-stream fill sequence 
corresponding to an Ibox or Ebox D-stream read is allowed to complete in order that the D-stream 
data is filled in the Pcache. However, the requested data will not be returned to the Ibox 
and/or Ebox. Any WRITE or STORE reference which existed in one of the Mbox reference 
sources PRIOR to the E%FLUSH_MBOX_H assertion is allowed to complete in the presence of the 
E%FLUSH_MBOX_H assertion. This is necessary because any write data existing in the Mbox prior 
to the E%FLUSH_MBOX_H assertion represents a memory modification corresponding to an action 
before the Ebox decided to flush. 

If E%FLUSH_MBOX_H is asserted during a cycle with an outstanding D-stream read or 
D-stream fill, the Mbox logic guarantees that the M%rfiOX_DAXAJL and M9bEBOX_DATA_H signals 
will not be asserted in response to the D-stream read/fill during any subsequent cycles. 
However, M%rfiOX_DATA_L or M%EBOXJXA3X.II may be asserted during the same cycle that 
E%FLUSH_MBOX_H is asserted. It is the responsibility of the Ibox and Ebox to ignore the 
corresponding data in this case. 

Note that I9cFLUSH_IEEF_LAT_H causes an outstanding I-stream fill sequence to be completely 
stopped, but E9cFLUSH_MBOX_H allows an outstanding D-stream fill sequence to continue without 
returning data to the Ibox and/or Ebox. These two cases are handled differently based on 
performance model data which indicates that it is beneficial to future references to complete 
the D-stream fill, but allowing the I-stream fill to complete only hinders the immediate need of 
accessing different I-stream data. 

12.3.21.2.3 Ebox Flushing of the PA_QUEUE 

The function of E*3»FLUSH_MBOX_H described above is to clear out reference state associated with 
instructions that had not yet been started by the Ebox. Note however, that E^FLUSELMBOX.H 
does not flush the PA_ QUEUE even though the PA_QUEUE may contain reference state that 
should be logically flushed by E%FLUSH_MBOX_H. This is because the PA_QUEUE may also contain 
reference state associated with the currently executing Ebox instruction. The PA_QUEUE entries 
associated with the currently executing Ebox instruction must be retired from the PA_QUEUE 
in the normal fashion before the remaining PA.QUEUE entries may be flushed. 

Thus, flushing the PA_QUEUE is a two-step process described as follows: 

1. As described in Section 12.3.21.2, E%FLUSH_MBOXJB inhibits the Mbox arbitration logic from 
selecting SPEC_QUEUE references for processing during subsequent cycles. This function 
guarantees that no more PA_QUEUE entries can be filled during subsequent cycles. 

2. Once the Ebox has issued all STOREs corresponding to state modifications that must occur 
before the Mbox is completely flushed, the Ebox issues another reference which is qualified 
with the E%FLUSH_PA_QUEUE_H signal. Once this EMJLATCH reference executes in S5, 
the Mbox is guaranteed to have completed all subsequent STORE references. Thus, when 
this EMJLATCH reference executes, the remaining entries in the PA_QUEUE are flushed. 
Note that both halves of an unaligned STORE will complete before the "E%EX.USH_EA W QUEUE" 
reference is executed because the second half of the reference is stored in the VAP_LATCH, 
which has higher priority than the EM_LATCH. 
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The Ebox will assert E%RESTART_SPEC_QUEUE_H once the "E%FX.USH_PA W QUEUE" reference 
has been latched in the EMJLATCH. E%RESTART_SPEC_QUEUE_H re-enables Mbox processing 
of SPEC_QUEUE references during subsequent cycles. 

MICROCODE RESTRICTION 

E%FLUSHJKB<MLH 

has been asserted, E%FLUSH.PA_QUEUE_H and E%RESTART_SPEC_QUEUE_H must be 
asserted before the Ibox or Ebox require further Mbox processing of Ibox or Ebox 
D-stream references. E%FLUSH_PA_QUEUE_H and E%RESTART_SPEC_QUEUE_H must be 
asserted during a cycle subsequent to the assertion of E%FLUSH_MBOX_H, and only 
when the microcode guarantees that all corresponding STORE commands have been 
retired by the EM_LATCH. 
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12.4 THEPCACHE 

Hie Pcache is a two-way set associative, read allocate, no-write allocate, write through, physical 
address cache of I-stream and D-stream data. The Pcache has a one cycle access and a one 
cycle repetition rate for both reads and writes. It stores 8192 bytes (8K) of data and 256 
tags corresponding to 256 hexaword blocks (1 hexaword = 32 bytes). Each tag is 20 bits wide 
corresponding to bits <31:12> of the physical address. There are four quadword subblocks per 
block with a valid bit associated with each subblock. The access size for both Pcache reads and 
writes is one quadword. Even byte parity is maintained for each byte of data (32 bits per block). 
One bit of even parity is maintained for every tag. 

The logical organization of the Pcache is shown below: 
Figure 12-43: Logical Pcache Organization 





: I A I 7? I TAG I VB I D/D? i D/D? I D/D? I D/DP I TP I TAG | VB i D/Dr i D/DP I D/D? ! D/D? I 

where: A « Allocation bit. Indicates whether the left or right bank was last allocated. 

TP - 1 bit of even tag parity. 

TAG - 20 bits of tag address . 

VS - 4 valid bits. Each bit corresponds to 8 bytes of data. 

D/DP - 8 bytes of data with 8 bits of even byte parity (72 total bits) . 



The Pcache is logically organized into 128 direct mapped indexes, where each index consists of 
two blocks, and each block consists of: 20-bit tag, 1-bit tag parity, 4 valid bits, 256 bits of data, 
and 32 bits of data parity. In addition, each index also contains a one bit allocation pointer. 

The breakdown of address bits for Pcache decoding is shown below: 
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where: tag address - bits loaded into or compared with tag. 
index address « addresses I of 128 indexes. 

subblk address » addresses 1 of 4 aligned guadwords within the hexaword data block. 



12.4.1 PCCTL 

The PCCTL controls the mode of operation of the Pcache. PCCTL is accessible by IPR_RD and 
IPR_WR operations. See Figure 12-31 for the definition of this register. 

Note that Pcache operation is further qualified by the state of PCSTS<0> (See Section 12.6 for 
more information about PCSTS). If this bit is non-zero, Pcache operation is automatically forced 
to behave as if I_ENABLE=0 and D_ENABLE=0, regardless of the actual state of I_ENABLE 
and D_ENABLE. Effectively, this shuts down normal Pcache operation due to the presence of a 
previous Pcache parity error. 

Note that Pcache invalidate operations are only disabled if both D_ENABLE=0 and I_ENABLE=0, 
or if PCSTS<0> is set. 

Note that the ELEC_DISABLE bit of PCCTL is intended for debug use only. This bit 
electrically disables the Pcache to reduce power dissipation. This bit should only be set when 
the Pcache is functionally turned off by the deassertion of both I_ENABLE and DJENABLE. 
UNPREDICTABLE operation will result when this bit is set when either I.ENABLE or 
D_ENABLE is also set. Any further discussion concerning Pcache function assumes that 
ELECJDISABLE is inactive. 

Also note that all Pcache IPR_RD and IPR_WR operations will function correctly regardless of the 
state of IJENABLE or DJSNABLE or PCSTS<0>. However, Pcache array IPRs will not function 
if ELEC.DISABLE is set. 

If either DJENABLE or IJENABLE are to be toggled to the on state, the Pcache array must 
be initialized prior to such action. See Section 12.8.2.1 for more information about Pcache 
initialization. 

When the FORCEJEHT (Force Hit) bit is set and I-stream or D-stream operation is enabled, all 
enabled memory space read and write references are forced to hit in the Pcache regardless of the 
value of the stored tag. The BANK_SEL bit specifies which tag of the pair of tags addressed is 
forced to hit. Thus when FORCE_HIT=l, the Pcache becomes a 4K direct mapped cache with all 
reads and writes forced to hit in the Pcache. Toggling BANK_SEL causes the other half of the 
8K Pcache to become accessible in this direct mapped mode. Note that BANK_SEL never affects 
bank selection during IPR reads and IPR writes to the Pcache tags or Pcache data parity bits; 
bank selection for these commands is always determined by the specified IPR address. Also note 
that the FORCEJHIT bit only affects memory space references. I/O space references still miss in 
the Pcache regardless ofHhe state of the FORCEJHIT bit. 
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The FORCE_HIT feature is designed to facilitate testing the Pcache data array and to 
make diagnostic tests easily loadable within the Pcache by simple WRITE operations. When 
FORCE_HIT=0, the Pcache is configured as an 8K 2- way set associative cache, no reads or writes 
are forced to hit, and the BANKJ3EL bit is a don't care. 

The P_ENABLE (Parity Enable) bit allows the detection of Pcache tag and data parity errors to 
be enabled or disabled. If P_ENABLE=0, Pcache parity errors will not be detected. Thus when 
P_ENABLE=0, no Pcache error will be recorded in PCSTS or will be reported to the Ebox. 

Note however, that when FORCE_HTT=l, Pcache tag parity is never checked regardless of the 
state of P.ENABLE. 

12.4.2 Pcache Hit/Miss Determination 

12.4.2.1 Hit/Miss Determination by Tag Comparison 

When an IREAD, DREAD, DREAD.MODIFY, WRITE, WRITE.UNLOCK, or INVAL operation 
is executed, the Pcache must determine if the referenced data is present in its array. To do this, 
physical address bits<ll:5> are input to the Pcache row decoders in order to determine which 
one of the 128 direct mapped indexes is being addressed. Subsequently, all 629 bits within the 
addressed index are accessed by the assertion of the corresponding word line. The two accessed 
tag values are simultaneously compared to physical address bits<31:12>. A Pcache hit condition 
occurs when all of the following conditions are simultaneously true: 

• The contents of one of the two addressed tags matches the data on M%S6_PA W H<31:12>. 

* The valid bit corresponding to both the matched tag and to the addressed subblock (specified 
by physical address bits<4:3>) is set. 

* The stored tag parity corresponding to the matched tag is the same as the value calculated 
off of M%S6_PA_H<31:12>. 

If an address match is detected on one of the tags and the valid bit which corresponds to both 
the matched tag and the addressed subblock (specified by physical address bits<4:3>) is set, then a 
Pcache hit condition has been detected on the corresponding Pcache tag. The absence of the Pcache 
hit condition causes a Pcache miss condition. 

12.4.2.2 Conditions which force Pcache Miss 

The Pcache miss condition is forced to override the tag determination of hit/miss described above 
when any one of the following conditions is satisfied: 

• If PCSTS<0> is set, the Pcache miss condition is forced due to a previous Pcache parity error. 

* If an IREAD or I_CF operation is accessing the Pcache and I_ENABLE=0, the Pcache miss 
condition is forced. 

• If a D-stream read or D_CF operation is accessing the Pcache and D_ENABLE=0, the Pcache 
miss condition is forced. 

* If a DREAD_LOCK operation is executing, the Pcache miss condition is forced. This 
guarantees that the read will propagate to the Cbox for synchronization purposes. 

• If an I_CF operation is executing and the IMISS_LATCH state indicates that the reference 
cannot be cached, the Pcache miss condition is forced. 
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• If a D_CF operation is executing and the DMISS_LATCH state indicates that the reference 
cannot be cached, the Pcache miss condition is forced. 

12.4.2.3 Conditions which force Pcache Hit 

The Pcache hit condition is forced to override the tag determination of hit/miss described above 
when any one of the following conditions is satisfied. Note that unless explicitly stated to the 
contrary, the forced Pcache miss conditions above take precedence over the forced Pcache hit 
conditions described below. 

• If a read reference is tagged as having a memory management fault or hard error associated 
with it (i.e. M_QUE_MS2%S6_QUAL_H<0> = 1 or M_QUEMS2%S6_QUAL_H<1> = 1), a Pcache hit 
condition is forced. NOTE: This force hit condition takes precedence over any force miss 
condition described above. 

• If the operation is a DREAD, DREAD.MODIFY, WRITE, or WRITE_UNLOCK, and 
DJENABLE=1 and FORCE_HIT=l, the Pcache hit condition is forced on the tag corresponding 
to both the addressed Pcache index and the bank specified by the BANKJ3EL bit EXCEPT 
when the address maps to I/O space. I/O references must never hit in the Pcache regardless 
of the state of FORCEJEHT. 

• If the operation is an IREAD and I_ENABLE=1 and FORCE_HIT=l, the Pcache hit condition 
is forced on the tag corresponding to both the addressed Pcache index and the bank specified 
by the BANKJ3EL bit. 

• If the operation is a D_CF and D_ENABLE=1 and the DMISS.LATCH state indicates that 
the reference is cacheable, the Pcache hit condition is forced and the bank is specified by the 
allocation field of the DMISS.LATCH. 

• If the operation is a I_CF and I_ENABLE=1 and the IMISS.LATCH state indicates that the 
reference is cacheable, the Pcache hit condition is forced and the bank is specified by the 
allocation field of the IMISSJLATCH. 

12.4.3 Pcache Read Operation 

A Pcache read operation is initiated by a DREAD, DREAD_MODIFY, or IREAD reference. A 
Pcache read begins by determining the Pcache hit or miss condition described above. If a Pcache 
hit is detected, the quadword of data corresponding to both the tag in which the hit occurred and 
to physical address bits<4:3> is driven out of the Pcache. 

If a Pcache miss condition is asserted, all the data driven out of the Pcache is ignored except for 
the allocation bit. The allocation bit is stored in the DMISSJLATCH (in the case of a D-stream 
read) or in the IMISSJLATCH (for an IREAD). This bit will be used during a cache fill operation 
to select the appropriate block to be filled (See Section 12.4.6 for information about allocating and 
filling blocks). 
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12.4.4 Pcache Write Operation 

A Pcache write operation is initiated by a STORE, WRITE or WRITE_UNLOCK reference. A 
Pcache write begins by determining the Pcache hit or miss condition described above. If a Pcache 
hit is detected, the data present on B%S6JDAIA 1 _H<63:0> is selectively written into the quadword 
corresponding to both the tag in which the hit occurred and to physical address bits<4:3>. The 
data is selectively written by using M%S6_BYTE_MASBLH<7:0> as a write enable for the eight 
respective bytes of data. The corresponding data parity is also written in the same manner for 
each corresponding byte which is written. 

If a Pcache miss condition occurs, no Pcache write operation takes place. However, the write 
reference is forwarded to the Cbox for processing regardless of the hit/miss condition in the 
Pcache. 

12.4.5 Pcache Replacement Algorithm 

When a Pcache miss occurs during a read operation, it must be decided which one of two blocks 
will be allocated for the subsequent Pcache nil sequence. When the Pcache miss occurred because 
no validated tag field matched the read address, the state of the corresponding allocation bit 
indicates which bank (left or right) should be used for the resulting fill sequence. The value of 
each allocation bit changes according to the "not-last-used" algorithm. That is, the allocation bit 
always points to the bank within the index that was not last accessed. 

When a read miss occurs because no validated tag field matched the read address, the value of the 
allocation bit is latched in the MISSJLATCH corresponding to the read miss. This latched value 
will be used as the bank select input during the subsequent fill sequence. As each fill operation 
takes place, the inverse of the allocation value stored in the MISSJLATCH is written into the 
allocation bit of the addressed Pcache index. During Pcache read or write operations, the value 
of the allocation bit is set to point to the opposite bank that was just referenced because this is 
now the new "not-last-used" bank. 

The one exception to this algorithm occurs during an invalidate. When an invalidate clears the 
valid bits of a particular tag within an index, it only makes sense to set the allocation bit to point 
to the bank select used during the invalidate regardless of which bank was last allocated. By 
doing so, we guarantee that the next allocated block within the index will not displace any valid 
tag because the allocation bit points to the tag that was just invalidated. 

12.4.6 Pcache Fill Operation 

A Pcache fill operation is initiated by the I_CF (I-stream cache fill) or D_CF (D-stream cache fill) 
reference. A fill operation can be considered to be a specialized form of a write operation. A fill 
is functionally identical to a Pcache write operation except for the following differences: 

• The bank within the addressed Pcache index is selected by the following algorithm. If a 
validated tag field within the addressed index matches the cache fill address, then the block 
corresponding to this tag is used for the fill operation. If this is not true, then the value of 
the corresponding allocation bit selects which block will be used for the fill. 

• The first fill operation to a block causes all four valid bits of the selected bank to be written 
such that the valid bit of the corresponding fill data is set and the other three are cleared. 
All subsequent fills cause only the valid bit of the corresponding fill data to be set. 
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• Any fill operation causes the fill address bits<31:12> to be written into the tag field of the 
selected bank. Tag parity is also written in a analogous fashion. 

• A fill operation causes the allocation bit to be written with the complement of the value 
latched by the corresponding MISS_LATCH during the initial read miss event. 

• A fill operation forces every bit of the corresponding byte mask field to be set. Thus, all eight 
bytes of fill data are always written into the Pcache array on a fill operation. 

12.4.7 Pcache Invalidate Operation 

A Pcache invalidate operation is initiated by the INVAL reference. The invalidate operation is 
interpreted as a NOP by the Pcache if the address does not match either tag field in the addressed 
Pcache index. If a match is detected on either tag, an invalidate will occur on that tag. Note that 
this determination is made based only on a match of the tag field bits rather than on satisfying 
all criteria for the Pcache hit condition (Pcache hit factors in valid bits and verified tag parity 
into the equation). 

"When an invalidate is to occur, the four valid bits of the matched tag are written with zeros and 
the allocation bit is written with the value of the bank select used during the current invalidate 
operation. 

Also note that an assertion of C%CBOX_HARD_ERR_H during a cache fill command causes the cache 
fill operation to be processed as if it were an INVAL operation. 

12.4.8 Pcache IPR Access 

For testability reasons it is important to verify that every Pcache storage bit can be read and 
written in both "0" and "1" states. The easiest way to do this is to provide a mechanism to directly 
read and write every bit in the Pcache array. The data field is already accessible through read 
and write commands. The tag field, tag parity, valid bits and data parity are directly accessible 
through IPR_RD and IPR_WE operations to the Pcache IPRs defined below: 
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Normal I PR Address 

31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08 1 07 06 05 04103 02 01 00 
+--+--+--+--+--+--+--+--+--+--+--+--+--+—+--+—+—+—+--+—+--+—+--+--+—+ +—+—+—+— 

1 SB2 | 0| SBZ | IPR number | 

+"+—+—+--+—+--+—+—+--+ — + „ + __ + __ + __ + __ + __ + „ + __ + __ + __ + __ + __ + __ + __ + __ + — + __ + — +__ + __ + __ + __ H 

Pcache TAG IPR Address 

31 30 29 28127 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+—+--+—+ — + — +--+--+ — +--+ — +--+ — + — +— + — + — + — +--+ — +— + — + — +--+ — +--+ — +--+ — +—+—+—+— ^ 
I SBZ | 1| 1| 0| SBZ | B | pcache index addr | SBZ | 

+__+__+__+ — +__+__+ — + — + __ + — +__+. — + — +__+ — +__+ — + „ + — + __ + — + — + __ + — + __ + — + __ + — + __ + — + __ + — H 

where: B - 0 — > select the left bank of the specified index. 

1 ««> select the right bank of the specified index. 

Pcache Data Parity IPR Address 

31 30 29 28127 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — ., 

I SBZ | 1| 1| 1| SBZ | B| pcache index addr | SBA | SBZ | 

+ — +--+ — + — + — +--+ — + — +--+ — + — + — + — +— + — + — + — +--+ — +--+ — + — + — + — + — + — + — + — +--+ — + — + — -i 

where: B - 0 — > select the left bank of the specified index. 

1 — > select the right bank of the specified index. 
SBA - subblock address selection 



The format of a Pcache tag IPR is shown in Figure 12-32. 

The tag parity bit is included in the Pcache tag IPR format to allow the user to write bad tag 
parity into the array in order to verify the tag parity logic. Further, the valid bits and allocation 
bit are also included so that the Pcache can be initialized to a known state. 

The format of a Pcache Data Parity IPR is shown in Figure 12-33. This IPR allows the Pcache 
data parity to be directly read and written for testability purposes. 

12.4.9 Pcache IPR Summary 

The following table summaries all IPRs associated with the Pcache: 



Table 12-18: Pcache iPRs 

IPR Address 

Register Name (in hex) 

PCADR (quadword address of reference causing Pcache parity error) F0 

POSTS (status of Pcache parity error) Fl 

PCCTL (control state of Pcache operation) F2 

PCTAG 01800000.. 0 180 1FE0 

PCDAP 01C00000..01C01FF8 



See Section 12.6 for a description of the PCADR and PCSTS registers. Note that with the 
exception of the Pcache tag IPRs, the addresses of the three other Pcache IPRs are driven into 
the Mbox shifted left two bits. This fact is not reflected in the above table. 
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12.4.10 Pcache States Resulting in UNPREDICTABLE operation 

The capability of arbitrarily altering Pcache state through IPR write operations allows for the 
possibility of putting the Pcache into obscure states which cannot be achieved by "normal" 
operation. Two of these states will cause UNPREDICTABLE behavior: 

1. Setting the ELECJDISABLE bit in PCCTL will cause IPR read operations to the Pcache 
tag or Pcache data parity bits to return incorrect data. Setting the ELEC_DISABLE bit 
will cause IPR write operations to the Pcache tag or Pcache data parity bits to be disabled. 
Setting ELECJDISABLE with either IJENABLE or DJENABLE set may cause Pcache read 
operations to return incorrect data. Setting ELEC_DISABLE with either I_ENABLE or 
D.ENABLE set will cause Pcache write, invalidate and cache fill functions to be disabled. 

2. Through explicit Pcache tag IPR write operations, a user could write both blocks of a Pcache 
index with the same tag, tag parity and valid bit data. If this condition occurs with one or 
more sub-block valid bits set, the Pcache will return invalid data on references corresponding 
to the written tag (note that normal Pcache operation precludes this situation from ever 
occurring). 

12.4.11 Pcache Redundancy Logic 

Due to the extreme density of the Pcache array, the Pcache has a high susceptibility to 
manufacturing defects. As a result, redundancy logic was designed in order to provide a 
mechanism which would allow the Pcache to function correctly in the presence of a small number 
of manufacturing defects. 

The redundancy logic consists of hardware which supports the operation of sixteen extra indicies 
which exist in addition to the 128 "regular" indicies. If a defect exists in an index which does 
not disturb the function of any column logic, the redundancy logic allows the bad index to be 
replaced by one of the 16 extra indicies. If an index is determined to be malfunctioning during 
chip test, a redundant index can be substituted for the bad index by blowing specific fuses on the 
chip through the use of a lazer. Blowing these fuses creates logic state transitions on redundancy 
control signals which disable the operation of a set of 4 "regular" indicies and will enable the 
operation of 4 redundant indicies in their place. 

Four sets of four redundancy fuses exist. Each set controls 4 of the 16 redundant indicies. Each 
set can map its 4 redundant indicies into one of 8 different sets of 4 "regular" indices. The 
redundancy mapping is shown below: 
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Figure 12-46: Pcache Address Redundancy Mapping 



31 30 29 28|27 26 25 24|23 22 21 20 1 19 18 17 16 1 15 14 13 12 111 10 09 08|07 06 05 04|03 02 01 00 
f — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I I RS | X | RED_ADDR | X| | 

t- — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

where: RS represents the address bits corresponding to the four sets of four redundancy fuses. 

The two X' s represent the address bits corresponding to the set of four indicies which 
get replaced. 

RED_ADDR represents the lazer-programmable address bits that specify which one of 8 sets 
of 4 "regular" indicies are to be replaced. 



Each set of 4 redundancy fuses consists of three bits to specify the address mapping (specified by 
RED_ADDR above) and 1 bit to enable the redundant indicies to operate in place of the specified 
set of "regular" indicies. When one or more redundancy elements are blown, another fuse is also 
blown which will set the RED_ENABLE bit in PCCTL (see Figure 12-31). Thus, by reading the 
PCCTL IPR one can determine if one or more redundancy elements has been enabled. 
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12.5 MEMORY MANAGEMENT 

The Mbox, the Eboz microcode, and the VMS memory management software implement VAX 
memory management. The Mbox performs the hardware memory management functions 
necessary to process most references in a quick efficient manner. The operating system 
software performs all other functions. For a description of the hardware end of VAX memory 
management, the reader is referred to the Memory Management chapter of the "VAX Architecture 
Standard" (DEC STD 032). For a complete description of the software end of VAX/VMS memory 
management, the reader is referred to the Memory Management chapters of "VAX/VMS Internals 
and Data Structures". 

The Mbox is responsible for the following memory management functions: 

• Performing virtual-to-physical address translations. 

• Maintaining a cache of PTEs to perform the quick translations. 

• Performing access mode checks on memory references. 

• Performing TNV checks on memory references. 

• Performing M=0 checks on memory references. 

• Directly or indirectly invoking a software memory management exception handler due to ACV 
(Access "Violation) or TNV (Translation not Valid) or M=0 faults. 

• Detecting cross-page conditions and performing the corresponding access mode checks. 
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1 2.5.1 NVAX MEMORY STRUCTURE 

12.5.1.1 Virtual Address Space 

The NVAX virtual address space conforms with the description of the VAX virtual address space. 
The space contains four gigabytes (2**32) of memory divided into four regions as shown below: 



Figure 12-47: Virtual Address Space Layout 
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NOTE 

NVAX CPU chips at revision 1 implement the original VAX memory management 
architecture in which any reference to a virtual address above BFFFFFFF (hex) falls 
into a reserved region and causes a length violation. NVAX CPU chips at revision 2 or 
later implement the extended SO space addressing described above. 
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12.5.1.2 Physical Address Spaces 

The NVAX hardware addresses a physical address space defined by another four gigabyte region. 
The first seven-eighths of it addresses physical memory. The top one-eighth of this space addresses 
I/O space. Thus, all I/O space addresses can be distinguished by physical address bits<31:29> = 
111 (binary). 

Figure 12-48: Physical Address Space of the NVAX Hardware 
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12.5.1.2.1 Physical Address Space Mappings 

The Mbox is designed to accommodate both a 30-bit and 32-bit physical address space as seen at 
the program level while maintaining one physical address space as seen by all NVAX hardware 
external to the Mbox (shown above). These two program level physical address spaces are mapped 
by Mbox hardware into the NVAX physical address space according to the value of the PAMODE 
register. See Figure 12-23 for a description of PAMODE. 

The PAMODE register is accessible by the IPR_RD and IPR_WR commands. When PAMODE=0, 
the 30-bit physical address space seen at the program level is translated into the NVAX physical 
address space as follows: 
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Figure 12-49: 30-bit Physical Address Mapping 
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Logically speaking, this mapping is accomplished by the Mbox by sign-extending physical 
address<29> into physical address<31:29>. 

When PAMODE=l, the 32-bit physical address space seen at the program level is directly 
translated into the NVAX physical address space: 
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32-blt Physical Address Mapping 
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1 2.5.1 .3 ADDRESS TRANSLATION AND THE TB 

For a complete description of VAX virtual address translation, the reader is referred to the 
Memory Management chapter of the "VAX Architecture Standard" (DEC STD 032). An overview 
of this process can be found in Section 2.6 of this specification. 

The Mbox performs virtual-to-physical address translations in the S5 pipe when the following 
two conditions are satisfied: 

1. The MAPEN bit is set (MAPEN enables virtual address translations). 

2. M_QUE%S5_QUALJH<6> indicates that the S5 reference is a virtual reference. 

When both of these conditions are met, the address in M_QUE%S6_VA_H<31K>> is translated by 
the Mbox, and the resulting physical address is driven on ]MLQUE%S5_PA_H<31K)> . If both these 
conditions are not satisfied, the contents of M_QUE%S5_VAJH<31:0> is treated as a physical address 
and is directly transferred to M^QUE%S5_PAJH<3lst>>. 

The TB (translation buffer) is the mechanism by which the Mbox performs quick 
virtual-to-physical address translations. It is a 96-entry read allocate fully associative cache 
of PTEs (Page Table Entries). 

The format of a page table entry and a TB entry are shown below. 
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Figure 1 2-51 : PTE and TB format 



Page Table Entry 

31 30 29 28127 26 25 24|23 22 21 20119 18 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 



PROT 



Ml S| 



Physical Page Frame Address 



where: V - valid bit 
PROT « authorized access modes 
M - modify bit 
S • reserved bit 



TB Entry 



2 2 
9 8 



2 2 
5 4 



2 2 

3 2 



I TBV | TP | TP BAR 



TAG | PROT | M 



where: TBV - TB entry valid bit 

TP - even tag parity bit 

TP_BAR - complement of TP 

TAG « virtual address<31:9> 

PROT - authorized access modes 

M - modify bit 

DP - even parity for validated PTE field 

PFN - physical page frame address 



Note that the TB entry stores all but three bits of the PTE field. The TB entry does not store the 
S bit because it is not used, and the TB entry does not store the upper two bits of the PTE PFN 
because these bits correspond to a larger physical address space than NVAX uses. The tag field 
stores the virtual page frame address. The TBV bit indicates whether the corresponding entry is 
valid. If TBV is set, then PTE<31> is valid because the TB only caches PTEs whose valid bit is 
set. 

The associativity of each TB entry is implemented by the use of comparators on the TBV and 
tag fields. When a virtual address is driven onto M_QUE%S5_VA l .H<31:0> at the start of a cycle, 
each TB tag comparator, whose corresponding TBV bit is set, looks for a match between the 
M_QUE%S5_VA_H virtual page frame address and its corresponding tag. If no comparator finds a 
match, the TB_MISS condition has occurred indicating that no TB entry contains a translation 
for the specified address (see Section 12.5.1.5.2 for discussion of TB_MISSes). 

If one of the entries detects a match (TBJEflT condition), the PFN, PROT, and M fields of 
the corresponding TB entry are read out of the TB. ]Vt.QUE%S5JPA_H<31:9> are driven with the 
contents of the accessed PFN. M_QUE%S5JPAja<8:0> are the untranslated bits addressing a byte 
within a page; therefore, these bits are driven directly from M_QUE%S5_VAJB<8:0>. 

The PROT, and M fields, which were driven out of the TB with the PFN, are used by the 
memory management exception detection logic to determine ACV and M=0 conditions (See 
Section 12.5.1.5.3). 

TB entries are allocated using a NLU (Not-Last-Used) TB allocation pointer. The TB entry pointed 
to by the NLU allocation pointer is allocated and validated during a TB_TAG_FILI/TB_PTE_FILL 
sequence. The allocation pointer increments in round robin fashion around every TB entry when 
a TB lookup accesses the entry pointed to by the allocation pointer or when a TB_PTE_FILL 
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operation is done. Because the allocation pointer is guaranteed not to point to the last entry 
referenced, this scheme implements a not-last-used allocation scheme. 

TB entries can be invalidated in the following ways: 

• An entry can be invalidated by being displaced from the TB by allocation of another PTE to 
the same TB entry. 

• An entry can be invalidated by execution of the TBIS (TB Invalidate Single) command. If the 
specified TBIS virtual address matches a TB tag, the TBV bit corresponding to the matched 
tag is cleared. Clearing the TBV bit invalidates the TB entry (See Section 12.3.13 ). 

• Entries can be invalidated by execution of the TBIP (TB Invalidate Process) command. TBIP 
causes the most significant bit of all the tag fields to be examined. If this bit is cleared, 
the corresponding TBV bit is cleared. The effect of this operation is to invalidate all PTEs 
corresponding to PO or PI space translations (See Section 12.3.14 ). 

• All entries can be invalidated by the execution of the TBIA (TB Invalidate All) command. 
This command resets the TBV bit of every TB entry (See Section 12.3.15 ). 

1 2.5.1 .4 30-blt to 32-blt Physical Address Translations 

When PAMODE=0, the NVAX system is configured such that only 30-bit physical addresses are 
processed at the program level. Since the Mbox and Cbox hardware is designed assuming a 32-bit 
hardware address space, the Mbox must appropriately translate all 30-bit physical addresses into 
32-bit physical addresses based on the mapping scheme shown in Figure 12-49. This is done in 
two ways. 

1. When the Mbox receives a physical address from one of its reference sources, the mapping is 
implemented by an address sign extension scheme involving the upper three address bits. In 
this scheme, address<31:30> are forced to the state of address<29>. 

2. When the Mbox receives a virtual address, virtual address translation occurs normally 
without any sign extension of the resulting physical address. This is possible because the 
corresponding sign extension function is preprocessed on the upper three bits of page frame 
address which is written into the TB during the TB_TAG_FILL operation. 

Note that restrictions exist about how the PAMODE register can be modified. See Section 12.8.2 
for more information. 

1 2.5.1 .5 MEMORY MANAGEMENT EXCEPTIONS 

1 2.5.1 .5.1 MME_DATAPATH 

The MME_DATAPATH (Memory Management Datapath) is used to process most memory 
management functions performed by the Mbox. Specifically, it performs the following functions: 

• Creates read references of PTEs in order to obtain virtual address translations not currently 
cached in the TB (See VAX Architecture Standard, DEC STD 032, for a description of this 
process). 

• Creates TB fill operations in order to fill tag and PTE data in the TB. 

• Stores most Mbox internal processor registers. 

• Stores virtual addresses associated with memory management faults. 

• Stores PTE addresses associated with M=0 faults. 
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The MME DATAPATH is illustrated below: 



Figure 12-52: MME Datapath 
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1 2.5.1 .5.1 .1 MME Register File 

The register file has one write port and two read ports (one for each input to the ALU). The 
register file contains the following longword registers: 



Reg Name 



Definition 



PAMODE 



Address Mode Register: enables 30 or 32-bit address mapping 
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Reg Name 


Definition 


MMAPEN 12 


Mbox Map Enable Register: turns on/off virtual translations 


MSLR 12 


Mbox System Length Register: Length of System Page Table 


MSBR 1 


Mbox System Base Register: Addr of System Page Table 


MPOLR 12 


Mbox PO Length Register: Length of PO Page Table 


MPOBR 1 


Mbox PO Base Register: Addr of PO Page Table 


MP1LR 12 


Mbox PI Length Register: Length of PI Page Table 


MP1BR 1 


Mbox PI Base Register: Addr of PI Page Table 


MMEADR 1 


MME Faulting Address Register 


MMEPTE 1 


PTE Address Register 


MMESTS 1 


Status of memory management exception 


TBADR 


Address of reference causing TB parity error 


TBSTS 


Status of TB parity error 


TMP1 


Scratch Register 1 


TMP2 


Scratch Register 2 



1 Testability and diagnostic use only; not for software use in normal operation. 
2 Ebox ucode sends and receives this data to/from the MME reg file shifted left 9-bits. 



Note that the datapath associated with this register file performs all bit shifts associated with 
MME processing except for 9-bit shifts required on MMAPEN, MSLR, MPOLR, and MP1LR 
registers. The Ebox microcode sends pre-formatted data to these registers such that the data 
has been pre-shifted left nine bit positions. This facilitates the MME datapath implementation. 
IPR_RD operations from these registers send data back to the Ebox in the same format. Thus, the 
Ebox microcode will re-format the data back into the standard formats illustrated in Table 12-3. 

Note that a 9-bit left shift is performed on MMAPEN so that the contents of MMAPEN can be 
used to increment a virtual address by a page in order to perform cross page check operations. 

The MME_ADDR latch stores the address which was driven on M_QUE%S5_VAJB<3 1 :0> 
during the previous cycle. The MME_DATA latch stores the data which was driven on 
M_QUE%S5_DAIA_H<3 1 :0> during the previous cycle. The A input to the ALU is either driven 
from MME_ADDR, MME.DATA, or the A read port of the register rile. 

12.5.1.5.1.2 MME ALU 

The ALU (Arithmetic Logic Unit) performs the following functions: 

• pass A used for receiving addresses and data from main S5 pipe. 

• pass B: used for reading/writing registers 

• A + B: used to generate PTE addresses (note 9-bit right shift on A input) 

• A - B: used for page table length checks of PO and SO space references (note 7-bit right shift 
on A input) 
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The output of the ALU can write the following: 

• address field of the MMEJLATCH (to generate PTE reads, TB tag fills and TB pte fills) 

• data field of the MMEJLATCH (to return requested IPR read data) 

• the register file 

12.5.1.5.1.3 MMESEQ 

The MME_SEQ is a state machine which controls sequencing of the MMEJDATAPATH. It controls 
which devices drive and latch data in the MME_DATAPATH, what ALU function is to be executed, 
and what command gets generated and latched in the MME.LATCH, The possible MME state 
sequences of the MME_SEQ are illustrated by the following two diagrams below: 
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Figure 12-53: MME Sequences 



START OF TB_MISS 
SEQUENCE 
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START OF MME IPR WR SEQUENCE 



LOAD ADDRESSED 
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END OF IPR WR SEQUENCE 



START OF ACV/TNV/M.O SEQUENCE 



CONDITIONALLY LOAD TMP1 
FROM MME.ADDR 






CONDITION* 
MME.FAl 

MME 


LLY UPDATE 

JLT_ADDR 
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_STAT 



START OF MME IPR_RD SEQUENCE 



LOAD MME_LATCH WITH 
ADDRESSED MME IPR 
AND ISSUE IPR DATA CMD 



IF M-0 CONDITION: 
GENERATE PTE ADDR AND 
CONDITIONALLY UPDATE 
MME PTE ADDR 



END OF ACV/TNV/M.O SEQUENCE 



END OF IPR.RD SEQUENCE 



There are five distinct entry points into the MME sequences: 

• TB_MISS Entry Point: Whenever a TB_MISS condition is detected on an Ibox or Ebox 
reference, the MMEJ3EQ executes the sequence denned by the TB_MISS Entry Point. 

• Cross Page Entry Point: The MME_SEQ executes the Cross Page Sequence in order to check 
for MME faults which may exist on the upper page of a reference that crosses a page boundary. 

• ACV/TNV/M=0 Entry Point: The MME_SEQ can execute this sequence when an ACV, TNV, 
or M=0 condition is detected on an S5 reference, or when an ACV or TNV condition is detected 
during the TB miss sequence. 

• MME IPR_RD Entry Point: The MMEJ5EQ executes this flow when an Mbox IPR register 
located in the MMEJDATAPATH is addressed by an IPRJRD command. 

• MME IPR_WR Entry Point: The MME_SEQ executes this flow when an Mbox IPR register 
located in the MME_DATAPATH is addressed by an IPR.WR command. 

Once an MME sequence starts, the processing of all Ibox and Ebox references is inhibited until 
the sequence completes. Once the MME sequence terminates, normal processing resumes and 
the original reference which initiated the MME sequence will be retried. 
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1 2.5.1 .5.2 TB MISS SEQUENCE 

When memory management is enabled (MAPEN=1) and no valid tag entry in the TB matches 
the corresponding virtual page frame address applied on M_QUE%S5_VA_H<31:9>, the TB does not 
contain the necessary translation information to convert the address to physical space. In this 
situation, the TB asserts its TB_MISS signal which initiates a series of sequential events that 
will cause the proper PTE to be written into the TB. 

12.5.1.5.2.1 Single Miss Sequence 

A single miss sequence is defined as a TB miss sequence with only one TB miss occurring 
during the sequence. The following series of events characterizes a single TB miss sequence 
(see Figure 12-53 for a flow chart description of this sequence): 

• cycle 1: TB asserts TB_MISS. S5 reference is aborted (will be retried later). MME_ADDR 
latches M_QUE%S5_VAja. 

• cycle 2: TMP1 is loaded from MME_ADDR in order to store the TB miss address in the MME 
register file. 

• cycle 3: The proper page table length check is performed using TMP1, the appropriate XLR 
and a subtract ALU operation. If a length violation exists, the execution sequence continues 
in the ACV/TNV/M=0 sequence (See Section 12.5.1.5.3.6). 

• cycle 4: The address field of the MMEJLATCH is loaded with the TMP1 fault address and 
the MMEJLATCH is validated with a TB_TAG_FILL command. 

• cycle 5: The TB_TAG_FILL command executes in S5 (assuming no Cbox reference took 
priority) to allocate a TB entry corresponding to the TB miss address. 

The corresponding PTE address is formed using TMP1, the appropriate XBR and the A+B 
ALU operation. The PTE DREAD is loaded into the MMEJLATCH. 

• cycle 6: The PTE DREAD is started in S5 (assuming no Cbox reference took priority). If 
this is an SPTE (System Page Table Entry) DREAD, this reference is physical and, therefore, 
cannot have a TB.MISS and/or TNV condition associated with it. If this is a PPTE DREAD 
(Process Page Table Entry) DREAD, this reference is virtual and can have a TB_MISS and/or 
TNV condition associated with it. Since a single miss sequence is being described here, a 
PPTE DREAD hits in the TB by definition (see Section 12.5.1.5.2.2 for a description of when 
this reference misses). 

Note that no ACV protection checks are performed on this DREAD because it is an Mb ox PTE 
DREAD. No TNV checks are performed because only PTEs with PTE<31> set are cached in 
the TB. No M=0 check is performed since this is strictly a read operation. Assuming TB miss 
problems occurred, the address is now properly translated and the DREAD continues into S6. 

• cycle x: The PTE data is available on the M%MD_BUS_H<31:0>. This data is latched in the 
address field of the MME_LATCH. ACWTNV checks are performed on the protection and 
valid bit fields of the incoming PTE data. If an ACV/TNV condition is detected, the memory 
management sequence continues in the ACV/TNV/M=0 sequence (See Section 12.5.1.5.3.6). 
If neither condition is detected, the MME.LATCH is validated with the TB_PTE_FILL 
command. 

• cycle x+1: The TBJPTEJFILL command is executed in S5 (assuming no other Cbox command 
took priority) to load the PTE into the TB and validate the TB entry. Normal processing 
resumes and the reference which causes the original TB miss will be retried. 
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12.5.1.5.2.2 Double Miss Sequence 

When the MMEJDATAPATH generates a PPTE DREAD in order to resolve a TB miss, the PPTE 
address is itself a system virtual address. Therefore, it is possible for the PPTE DREAD to 
generate a second TB miss. In this case, the PPTE DREAD TB miss must be processed first in 
order to translate the PPTE DREAD address. Following this, the original TB miss sequence can 
resume in order to translate the initial faulting address. This scenario is called a double TB miss 
and is shown below (see Figure 12-53 for a flow chart description of this sequence): 

• cycle 1: TB asserts TB_MISS. S5 reference is aborted (will be retried later). MME_ADDR 
latches M_QUE%S6_VA_H. 

• cycle 2: TMP1 is loaded from MME_ADDR in order to store the TB miss address in the MME 
register file. 

• cycle 3: The proper page table length check is performed using TMP1, the appropriate PXLR 
and a subtract ALU operation. If a length violation exists, the execution sequence continues 
in the ACTV/TNV7M=0 sequence (See Section 12.5.1.5.3.6). 

• cycle 4: The data field of the MME.LATCH is loaded with the TMP1 fault address as the 
MME.LATCH is validated with a TB_TAG_FILL command. 

• cycle 5: The TB__TAG_FILL command executes in S5 (assuming no Cbox reference took 
priority) to allocate a TB entry corresponding to the TB miss address. 

The corresponding PPTE address is formed using TMP1, the appropriate PXBR and the A+B 
ALU operation. The PPTE DREAD is loaded into the MME_LATCH. Note that because the 
Mbox generated a PPTE DREAD as part of a TB miss sequence, the virtual reference is loaded 
into the MME.LATCH with the ACV7M=0 reference qualifier cleared so that ACV checks will 
not be performed on the reference. 

• cycle 6: The PPTE DREAD is started in S5 (assuming no Cbox reference took priority). The 
TB asserts TB_MISS again because the PPTE address translation was not present in the TB. 
MME_ADDR latches the PPTE DREAD address and the DREAD is aborted. 

• cycle 7: TMP2 is loaded from the MME_ADDR with the PPTE DREAD address. 

• cycle 8: The system page table length check is performed using TMP2, SLR and the A-B ALU 
operation. If a length violation exists, the execution sequence continues in the ACV/TNV /M=0 
sequence (See Section 12.5.1.5.3.6 ). 

• cycle 9: The address field of the MME_LATCH is loaded with the TMP2 PPTE fault address 
as the MME_LATCH is validated with a TB__TAG_FILL command. 

• cycle 10: The TB__TAG_FILL command executes in S5 (assuming no Cbox reference took 
priority) to allocate a TB entry corresponding to the TB miss address. Note that the TB entry 
that is allocated destroys the previous TB entry allocation for the original TB miss because 
the NLU TB allocation pointer has not moved. 

The corresponding SPTE address is formed using TMP2, SBR and the A+B ALU operation. 
The SPTE DREAD is loaded into the MME_LATCH. 

• cycle 11: The SPTE DREAD is started in S5 (assuming no Cbox reference took priority). Note 
that this DREAD has a physical address. Therefore, no memory management problem can 
occur on this read. 

• cycle x: The SPTE data is available on the M%MD_BUS_H<31:0>. This data is latched in the 
address field of the MME_LATCH. ACV7TNV checks are performed on the protection and 
valid bit fields of the incoming PTE data. If an ACV7TNV condition is detected, the memory 
management sequence continues in the ACTWTNv7M=0 sequence (See Section 12.5.1.5.3.6). 
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If neither condition is detected, the MME_LATCH is validated with the TB_PTE_FILL 
command. 

• cycle x+1: The TB_PTE_FILL command is executed in S5 (assuming no other Cbox command 
took priority) to load the SPTE into the TB and validate the TB entry. Note that the NLU 
TB allocation pointer is incremented on a TB_PTE_FILL operation. 

In order to re-allocate a TB entry for the original TB miss address, the address field of the 
MME_LATCH is loaded with TMP1 while the command field is loaded with a TB_TAG_FILL 
command. 

• cycle x+2: The TB_TAG_FILL command is executed in S5 (assuming no other Cbox command 
took priority) to re-allocate a TB entry corresponding to the original TB miss. 

The original PPTE address is re-generated using TMP1, the appropriate PXBR and the A+B 
ALU operation. The PPTE DREAD is loaded into the MME_LATCH (ACV checks are once 
again disabled for this reference). 

• cycle x+3: The PPTE DREAD is started in S5 (assuming no Cbox reference took priority). 

Note that no ACV protection checks are performed on this DREAD because it is an Mbox PTE 
DREAD. No TNV checks are performed because only PTEs with PTE<31> set are cached in 
the TB. No M=0 check is performed since this is strictly a read operation. The PPTE DREAD 
address is now properly translated. 

• cycle y: The PPTE data is available on M%MD_BUS_H<31:0>. This data is latched in the 
address field of the MME_LATCH. ACV7TNV checks are performed on the protection and 
valid bit fields of the incoming PTE data. If an ACV7TNV condition is detected, the memory 
management sequence continues in the ACV7TNV/M==0 sequence (See Section 12.5.1.5.3.6). 
If neither condition is detected, the MME_LATCH is validated with the TB_PTE_FILL 
command. 

• cycle y+1: The TB_PTE_FILL command is executed in S5 (assuming no other Cbox command 
took priority) to load the PPTE into the TB and validate the TB entry. Normal processing 
resumes and the reference which caused the original TB miss will be retried. 

MICROCODE RESTRICTION 

To avoid a potential infinite loop case whereby the Mbox is stuck in the TB double 
miss sequence forever, the Ebox microcode must guarantee that it issues a non-STORE 
instruction other than TBIA, TBIS, or TB_TAG_FILL during the cycle immediately 
preceding the cycle it issues either a TBIA, TBIS or TB_TAG_FILL instruction. 

1 2.5.1 .5.3 ACV/TNV/M=0 

12.5.1.5.3.1 ACV/TNV/M=0 Fault Handling: 

In order for an ACV, TNV, or M=0 fault to be processed, the following steps must occur: 

1. The Mbox must detect the ACV/TNV/M=0 condition. 

2. The Ebox microcode must be invoked to start processing the condition. 

3. The Ebox microcode must probe Mbox state in order to determine which fault occurred and 
how it should be processed. 

4. The Ebox microcode must service the fault condition directly, or it must invoke an operating 
system memory management service routine to service the fault. 
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5. If the memory management fault was not fatal to the process, normal instruction execution 
resumes by restarting the instruction corresponding to the memory management fault after 
servicing the fault. 

1 2.5.1 .5.3.2 ACV detection: 

The protection field of a PTE indicates the authorized access rights for each execution mode. 
When a reference causes the TB to access a PTE, the protection field of the PTE corresponding 
to the reference is driven out of the TB. The ACV (Access Violation) detection logic uses the PTE 
protection field, MJJUE%S5_AT_H<1:0>, and the appropriate CPU execution mode from the Ebox 
(i.e. user, supervisor, executive, kernel) to detect access violations. If, for example, the protection 
field indicates a "read-only" access in user mode, the CPU execution mode specifies user mode, 
and M_QUE%S5_AT_H<1:0> indicates write access, then an ACV condition is flagged since a write 
reference is not allowed to this page in user mode. 

A 2:1 MUX controls the source of the CPU execution mode. The CPU execution mode information 
is normally taken directly from the current mode field of the PSL (PSL<25:24>). On PROBE 
references, however, the CPU execution mode is driven from E%MMGTJWODE_H<1:0> in order to 
check for ACV conditions for an execution mode which the CPU is not currently in. 

An ACV condition is also generated when a PTE reference fails to satisfy the page length check 
corresponding to the virtual space of the reference or when the virtual reference falls into reserved 
page region of virtual memeory (FFFFFEOO-FFFFFFFF). Either condition is reported as an ACV 
length violation. 

An ACV check is also performed on the protection field of all PTEs which have just been sent to 
the Mbox due to an earlier Mbox DREAD issued during the TB_MISS sequence. 

ACV protection and length checks are performed on all Ibox and Ebox references and on all 
MME_CHKs. ACV page length checks are performed on all PTE addresses. However, ACV 
protection checks are never performed on PTE read references generated by the Mbox. 

Note that the ACV protection condition is disabled from occurring during any cycle where the 
reference is aborted. 

When an ACV condition occurs, the MME_SEQ is invoked to execute the ACV/TNV/M=0 sequence. 
ACV checks only occur on virtual addresses when memory management is enabled and when the 
reference indicates that memory management checks should be done (i.e. M_QUE%S5_QUAL_H<2> 
= 1). 

1 2.5.1 .5.3.3 TNV detection 

When the PTE valid bit is clear, it indicates that the corresponding PTE page frame address 
translation is not valid. This is called a Translation Not Valid Fault (TNV). TNV detection 
only occurs during the TB_MISS sequence when the Mbox receives PTE data from the Pcache 
or Cbox such that the PTE valid bit (PTE<31>) is clear. When a TNV fault is detected, the 
MMEJSEQ interrupts the TB_MISS sequence and invokes the ACV/TNV/M=0 sequence. By 
doing so, the invalid PTE is never cached in the TB and a memory management fault is recorded 
(See Section 12.5.1.5.3.5 on recording memory management faults). 
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12.5.1.5.3.4 M=0 detection: 

When a virtual reference causes the TB to access a PTE, the modify bit of the PTE is read 
out of the TB. A cleared modify bit indicates that the corresponding page has not been written 
to. If the valid bit of the PTE is set, and the modify bit is clear and the access type of the S5 
reference indicates an intention to modify the page (e.g. write or modify access type), then the 
Mbox must initiate the proper sequence of events to process this "M=0" condition. The M=0 check 
is performed when memory management is enabled and a virtual reference hits in the TB. 

Note that the M=0 condition is disabled from occurring during any cycle where the reference is 
aborted. 



1 2.5.1 .5.3.5 Recording ACV/TNV/M=0 Faults 

In order for the microcode to determine the nature of the memory management fault detected 
by the Mbox, the Mbox must record the necessary fault information. The fault information is 
recorded in Mbox IPRs which can be read by Ebox microcode. The fault information is stored in 
three of the registers in the MME register file which are accessible to microcode by IPR reads 
and writes: 

• The MMEADR register stores the virtual address associated with the ACV, TNV or M=0 fault. 
As per SRM requirements, if the ACV/TNV fault occurred by referencing a PTE during a TB 
miss sequence, the MMEADR stores the original address and not the PTE address. 

• The MMEPTE register stores the virtual or physical address of the Page Table Entry 
corresponding to a virtual reference upon which an M=0 condition has been detected. 

• The MMESTS register stores state which indicates to the microcode the context and type of 
fault corresponding to the ACV/TNV/M=0 condition. The format of MMESTS is shown below: 



Figure 12-55: IPR EA (hex), MMESTS 



31 30 29 28127 26 25 24|23 22 21 20 1 19 18 17 16 1 15 14 13 12 111 10 09 08|07 06 05 04 103 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I | SRC | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0 I FAULT | 0| 0| 0| 0| 0| 01 0| 0| 0| 0| 0| M| |LV| :MMESTS 



+ LOCK PTE REF — + 



Table 12-19: MMESTS Field Descriptions 



Name 



Extent Type Description 



LV 

PTEJREF 
M 

FAULT 



0 RO Indicates ACV fault occurred due to length violation. 

1 RO Indicates ACV/TNV fault occurred on PTE reference corresponding 

to MMEADR. 

2 RO Indicates corresponding reference had write or modify intent. 

15:14 RO Indicates nature of memory management fault. See Fault bit 
encodings below 
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Table 12-19 (Cont): MMESTS Field Descriptions 



Name Extent Type Description 



SRC 28:26 RO 
LOCK 31:29 RO 


Complemented shadow copy of LOCK bits. However, the SRC bits 
do not get reset when the LOCK bits are cleared. 

Indicates the lock status of MMESTS. See LOCK encodings below. 
This field is cleared on e%flusblmbox_h. 


Table 1 2-20: LOCK Encodings 


Defined LOCK values 
(binary) 


Definition 



000 


MMESTS, MMEADR and MMEPTE are unlocked. 


001 


valid IREAD fault is stored (no other IREAD fault can overwrite MMESTS, 
MMEADR, or MMEPTE). 


011 


valid Ibox specifier fault is stored (only an Ebox reference fault can overwrite 
MMESTS, MMEADR, or MMEPTE). 


111 


valid Ebox fault is stored (MMESTS, MMEADR, and MMEPTE are 
completely locked). 



Note that the encodings for the SRC bits are the complemented version of the the LOCK bits. Thus, 
for example, a fully locked SRC encoding is 000. 

Table 12-21: FAULT Encodings 

Defined FAULT values 



(binary) Definition 

01 ACV Fault. This is the highest priority fault in the presence of multiple 

simultaneous faults. 

10 TNV Fault. This is the next highest priority fault. 

11 M=s0 Fault. This is the lowest priority fault. 



Due to the macropipeline design, the MMEADR, MMEPTE and MMESTS registers must be 
conditionally loaded in a prioritized fashion. These registers are loaded depending on the relative 
states of their current contents and on the context of the current fault. If the MMESTS register 
is empty, the current fault state is always loaded. If the MMESTS register contains a valid 
fault condition, the MMEADR, MMEPTE and MMESTS are only loaded if the current fault is 
associated with a pipe stage further along in the pipe than the stage corresponding to the stored 
MMESTS state. This loading priority is necessary because these memory management faults 
must be reported within the context of the execution of the instruction they are associated with. 
A fault detected on an Ebox reference is loaded provided that another Ebox reference fault is 
not already loaded. Faults detected on Ibox specifier references are only loaded if no Ebox or 
Ibox specifier reference fault is currently stored. Faults on Ibox I-stream references are only 
loaded if the MMESTS register is empty. In effect, the MMESTS register captures the first 
memory management exception that will be associated with Ebox execution. Stated differently, 
it captures the fault which occurs farthest along in the macropipeline. 
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The LOCK field of MMESTS specifies the source of the faulting reference currently stored 
in MMESTS. Thus, the decision to load another faulting reference into MMESTS is made by 
examining the bits of the LOCK field. 

The FAULT field is set in a prioritized manner. That is, an ACV fault takes precedence over 
a TNV or M=0 fault. A TNV fault takes precedence over an M=0 fault. Therefore, if multiple 
pending fault conditions are true, only the fault condition with the highest priority is reported in 
the MMESTS register. 

When the Ebox starts the memory management exception microflow, it issues an IPR_RD to the 
MMESTS to determine the nature of the memory management fault. The MMESTS register is 
automatically unlocked by resetting the LOCK field when the E%FLUSH_MBOX u .H signal is asserted 
by the Ebox. 

1 2.5.1 .5.3.6 ACV/TNV/M=0 MME_DATAPATH Sequence 

When an ACV/TNV7M=0 condition occurs the MMEJDATAPATH performs the following actions 
in order to record the fault for subsequent use by the Ebox microcode. 

• cycle 1: ACV, TNV, or M=0 condition is detected. MME_ADDR latches M.QUE^SSJVA^H 
address. Note that the S5 reference is NOT aborted. 

If the faulting reference is associated with an Ebox reference, M%MME_TRAP_L is asserted to 
the microsequencer to generate a memory management microtrap. If the faulting reference 
was associated with a DEST_ADDR command, the MME fault is logged in the corresponding 
PA_QUEUE entry. In all other cases (IREADs and Ibox D-stream reads) M%MME_FAULT_H 
qualifies the M%MD_BUS_H indicating that the requested data had a memory management 
problem. 

• cycle 2: If this ACV/TNV/M=0 sequence was not invoked from a previous MME_SEQ flow, the 
contents of MME_ADDR are loaded into TMP1. If this sequence was invoked from another 
MME_SEQ flow, TMP1 is not loaded because it already contains the original address that 
must be reported for this AC V7TNV condition. 

• cycle 3: The source of the reference which directly/indirectly invoked the MME fault is 
compared to MMESTS<31:29> (the LOCK field) to determine whether this fault should be 
recorded in MMEADR, MMEPTE, and in MMESTS. If a previous fault of equal or greater 
priority is already stored in MMESTS, MMESTS, MMEADR, and MMEPTE are not updated. 

If the LOCK field indicates that this fault should be recorded, MMEADR is loaded from TMP1 
and MMESTS is updated as follows: 

Table 12-22: MMESTS State Update 



fault type 



MMESTS<15:14> 



MMESTS<2:0> 



ACV without MMEJ3EQ active (no modify intent) 
ACV without MME_SEQ active (modify intent) 



01 



01 



000 



100 



M=0 



11 



100 



length violation on ref during TB_MISS seq (no modify) 
length violation on ref during TB_MISS seq (modify intent) 



01 



01 



001 



101 
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Table 1 2-22 (Cont.): MMESTS State Update 



fault type 



MMESTS<15:14> 



MMESTS<2 



length violation on PTE ref during TB_MISS seq (no modify 
intent on original reference) 



01 



Oil 



length violation on PTE ref during TB_MISS seq (modify 
intent on original reference) 



01 



111 



TNV on PTE ref during TB.MISS seq (no modify intent on 
original reference) 



10 



010 



TNV on PTE ref during TB_MISS seq (modify intent on 
original reference) 



10 



110 



The LOCK field of MMESTS is updated appropriately. 

• cycle 4: If MMESTS was updated during cycle 3, and the fault was M=0, the corresponding 
PTE address is formed using TMP1, the appropriate XBR and A+B ALU operation. The PTE 
address is then loaded into MMEPTE. 

1 2.5.1 .5.3.7 Microcode Invocation of ACV/TN V7M=0 

Microcode is invoked for ACV/TNV/M=0 faults in three different ways: 

• If the faulting reference originated from the Ebox, then the Mbox asserts M%MME_TRAP_L to 
invoke a memory management microtrap. M%MME_TRAP_L is asserted at the end of the cycle 
in which the ACT/TNV/M =0 fault was detected. Thus, from a microcode point of view, the 
microtrap happened before the EM_LATCH contents were retired. This microtrap invokes 
the ACV /TNV /M=0 microflow which handles the fault in the context of the reference executing 
in the Ebox. 

• If the faulting reference is a read sourced by the Ibox (either a D-stream or I-stream read), 
M_QUE%S5_QUAL_H<o> is set indicating that a memory management fault should be forced 
on this read. When the read propagates into S6, the Mbox forces the Pcache to hit and 
returns invalid data. This data, however, will be qualified with the M%MME_FAULT_H signal 
to indicate that the data is invalid and that an ACV/TNV/M=0 fault is associated with this 
data. When the Ebox references the corresponding D-stream operand, or requires the decode 
of the corresponding I-stream data, a microtrap is generated by the Ebox to invoke the 
ACV/TNV/M=0 microflow. 

If an MME fault occurs on the address of the address of an operand (i.e. Ibox decoding a 
deferred specifier), the Mbox records the fault in MMEADR and MMESTS in the usual way 
and returns data qualified by M%MME_FAULT_H. In some instances, the Ibox must issue a 
second reference to the Mbox based on the address returned by the first reference. Due to 
the fault however, the Ibox cannot issue a valid operand read address since the data returned 
by the first reference was invalid. In this case, the Ibox issues a read qualified with the 
I%PORCEJflME_FAULT_H signal. This causes the Mbox to "fake" an ACV/TNV violation by 
qualifying the returned data with M%MME_FAULT_H. This reference is trapped on when the 
Ebox references the operand. 

Note that when the Mbox "fakes" an ACV/TNV/M=0 violation, the MME.DATAPATH does 
not invoke a memory management response to either an ACT/TNV /M=0 problem or to a 
TB.MISS. Further, no state update is performed for either the MMESTS or MME.ADDR. 
Thus, these registers still record the true ACV/TNV error. 
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• If the faulting reference is a DEST_ADDR, an ACV/TNV/M=0 bit in the PA_QUEUE is set 
in the corresponding PA_QUEUE entry. When the Ebox microcode checks for the validity 
of the PA_QUEUE in order to send the corresponding STORE data, the Ebox detects the 
ACV/TNV /M=0 condition and generates the microtrap. 

The PA.QUEUE hardware must guarantee that the first PA_QUEUE entry of an unaligned 
pair of entries must be marked with the ACTWTNV/M==0 condition regardless of which of the 
two references caused the fault. This is necessary so that the microcode takes the proper 
action at the start of the reference. 

If an ACV length violation or a TNV fault is generated on an Mbox PTE reference, the original 
reference (i.e. the reference that caused the memory management sequence which generated the 
PTE reference) must be marked as having an MME fault associated with it. Thus, when the 
original reference is retried after the memory management sequence completes, the reference 
will be treated as if the MME fault was associated with it. Note that the MMESTS register 
records the fact that the actual fault was associated with the PTE reference and not the original 
reference. 

1 2.5.1 .5.3.8 Microcode Processing of ACV/TNV/M=0: 

The NVAX macropipeline design can cause synchronization problems related to operating system 
processing of PTEs. The SRM states that "software is not required to flush TB entries after 
changing PTEs that were already invalid." Consider the case where an Ibox read prefetches 
an invalid PTE from a page table. Just after this read, the Ebox completes the previous 
macroinstruction by updating, validating and writing the same PTE back to memory. When 
the Ebox references the prefetched PTE operand, an invalid TNV fault will be generated because 
the PTE has just been validated. 

To prevent this scenario from occurring, the memory management fault microcode must re-test 
for fault conditions before invoking the actual fault sequence. If no fault is detected at this 
time, no fault processing occurs. Microcode re-tests the fault conditions by first asserting 
E%FLUSH_MBOX_H, which unlocks MMESTS and clears pending Mbox references. Following this, 
the microcode reads the fault address from MMEADR via an IPR_RD command and then issues 
a TBIS command corresponding to this faulting reference. The TBIS will clear out the potentially 
out-of-date PTE in the TB which is associated with the fault. The microcode will then issue a 
PROBE command to the same address. The PROBE will cause the updated PTE to be cached 
in the TB (unless a TNV fault is detected) and will record the new fault status in MMESTS 
and return the status to the Ebox. Note that the PROBE command does not lock MMESTS. If 
the microcode detects a valid fault upon reading the PROBE status, microcode fault processing 
continues. Otherwise, the instruction is restarted without causing a memory management fault. 

If a real ACV or TNV fault was detected, it re-reads MMESTS to get the updated status based on 
the last PROBE operation. The microcode constructs and pushes the memory management fault 
stack frame consisting of the fault status, the contents of MMEADR, the PC of the corresponding 
instruction, and the PSL at the time of the fault. The microcode then reads the appropriate 
SCB (System Control Block) vector corresponding to either the ACV or TNV fault. Based on 
this vector, the microcode sets the appropriate CPU execution mode and redirects the PC to 
the appropriate operating system memory management macrocode fault handler. This software 
fault handler reads the fault status and the faulting address from the stack and processes the 
ACV or TNV fault based on this information. Once the fault is processed, an REI is executed, the 
macropipeline is flushed, and normal instruction processing resumes by restarting the instruction 
that originally caused the fault. 
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If the microcode read MMESTS and determined the fault to be an M=0 condition, the microcode 
processes the fault without the aid of operating system sofware. lb do this, the microcode performs 
the following actions: 

1. A TBIS command is issued which references the faulting address. This reference will cause 
the PTE, which was used to detect the M=0 fault, to be invalidated from the TB. 

2. The microcode will then test the faulting address to determine whether it was a process or 
system space reference. If it was a system space reference, the corresponding SPTE address 
must be a physical address. If it was a process space reference, the corresponding PPTE 
address must be a virtual address. 

3. The microcode then issues a DREAD using the PTE address it read from MMEPTE. 
If the microcode determined the PTE to be an SPTE, the read is issued with 
M_QUE%S5_QUAL_H<6>==0 indicating a physical read. If the microcode determined the PTE to 
be a PPTE, the read is issued with M_QUE%S5_QUAL_H<6>=1 and M_QUE%S5_QUAL_H<2>= 1 
indicating a virtual read with ACV and M=0 checks disabled because the Mbox must not 
perform M=0 checks and ACV protection checks on PTE references. 

4. When the PTE data is received, the Ebox sets the modify bit of the PTE indicating that 
the corresponding page is written. The new PTE is then written back into the page table 
in memory by issuing a physical WRITE or a virtual write with ACV7M=0 checks disabled, 
depending on the physical or virtual nature of the PTE. 

5. The microcode then flushes the macropipeline and resumes normal instruction processing by 
restarting the instruction corresponding to the M=0 fault. 

Note that when the address which caused the M=0 fault is restarted after the M=0 fault was 
serviced, the Mbox will generate a TB_MISS condition since the old PTE was invalidated from the 
TB. Subsequently, a TB_MISS sequence will be invoked which will cause the new PTE to be read 
into the Mbox and cached in the TB. 

1 2.5.1 .5.3.9 Pipeline Implications of ACV/TNV/M=0 condition 

12.5.1.5.3.9.1 Pipeline Effects for MME Faults on Write References 

If an ACV, TNV or M=0 condition occurs on a write reference, the faulting write is transformed 
into a NOP command in the S6 pipe. Thus, the Pcache and Bcache are prevented from modifying 
any memory state as a result of a memory management fault detected in S5. 

12.5.1.5.3.9.2 Pipeline Effects for MME Faults on Read References 

If the faulting reference is a read, the read must be prevented from leaving the Mbox pipe since a 
read to I/O space could cause detrimental state changes. This is handled by forcing the deassertion 
of M%CBOXJREF_ENABUE_L which causes the Cbox to ignore the read. 

12.5.1.5.3.9.3 Pipeline Effects of e%flush_mbox_h on MME State 

A more subtle implication involving the NVAX macropipeline exists which affects updating 
recorded Mbox MME state. Since the MME_SEQ executes independently of the Ebox microcode, 
the MME_SEQ must appropriately synchronize to Ebox execution such that MME state will not 
be updated for references that will never be processed by the Ebox. 
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Consider the following situation. A tb_miss sequence has begun on a specifier reference. During 
this sequence, the Ebox detects a branch mispredict which causes redirection of the processing 
stream. As the PTE data is returned to the Mbox, a TNV condition is detected. This TNV must 
not be recorded because it corresponds to a reference which the Ebox will not see due to the 
redirection of the execution stream. 

From the Mbox point of view, handling this scenario can be generalized as follows. If the Mbox 
receives a E%FLUSH_MBOX signal during any memory management sequence which may update 
mme state, one of three possibilities will happen: 

1. If E%FLUSH_MBOX is received after MME state has been updated, E%FLUSH_MBOX will unlock 
MMESTS so that only MME state corresponding to the redirected execution stream will be 
recorded. 

2. If E%FLUSH_MBOX is received during the cycle that an mme state update is being done, 
the functional effect of E%FLUSHJKBOX will predominate, thus causing the MMESTS to be 
unlocked. 

3. If E%FLUSH_MBOX is received before the state update, MMESTS will be cleared by 
E%FLUSH_MBOX and a state bit will be set which will inhibit any mme state updates during 
the remaining mme sequence. 

Note that the analogous problem exists when processing a memory management sequence on an 
IREAD when I%FLUSH_IREF_LAT_H is asserted. In this case, the following three possibilities can 
occur: 

1. If I%PLUSH_mEF_LAT_H is asserted when MMESTS contains a validated fault on an IREAD, 
I%FLUSH_IREF_LAT_H will unlock MMESTS. 

2. If l%FLUSH_ntEF_LAT_H is asserted during the cycle that an mme state update is being done 
on an IREAD reference, the functional effect of I%FLUSHJDREF_LAT_H will predominate, thus 
causing MMESTS to be unlocked. 

3. If I%FLUSH_ntEF_LAT_H is received before a MMESTS update but during a memory 
management fault sequence invoked from an IREAD, MMESTS will be cleared by 
I%FLUSH_IREF_LAT_H and a state bit will be set which will inhibit the subsequent mme state 
update. 

Note that while a special state bit is necessary to synchronize MME updates with Ebox execution 
stream redirection, no special mechanism is required to keep TB state synchronized. There 
are two reasons for this. First, the TB never validates a PTE whose PTE valid bit is clear. 
Secondly, the Mbox arbitration logic prevents Ebox references such as TBIS, TBIP, and TBIA 
from executing when a memory management sequence is executing. Therefore, TB state updates 
are always serialized with respect to TB invalidates generated by the Ebox microcode. 

12.5.1.5.3.9.4 Pipeline Effects of e%flush_mbox_h on m%mme_trap_l 

Just as E%FLUSEUMBOX_H must be examined in order that MME state remains synchonized 
to Ebox execution, E%FLUSH_MBOX_H must also be factored into the logic which generates 
M%MME_TRAP_L. This prevents the following scenario from occurring. If the Ebox has issued a 
DREAD which misses in the Pcache as a result of a MOVC instruction, the Mbox will propagate 
the reference forward to the Cbox. While the read is pending, the Ebox issues an MME_CHK 
command which TB misses causing the Mbox to initiate a TB miss sequence. During this 
sequence, the Cbox returns the read data qualified by C%CBOX_HARD_ERR_H. This causes the 
Ebox to microtrap into the error handler resulting in the assertion of E%FLUSH_MBOX_H. If the 
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Mbox were to subsequently assert M%MME_TRAP_L based on a memory management fault on the 
MME_CHK command, the Ebox would microtrap out of the error handler and initiate MME fault 
sequence that should never occur. 

Thus, the assertion of E%FLUSH_JMBOX^H during a memory management sequence inhibits 
the assertion of M%MME_TRAP_L during that cycle or any subsequent cycles of the memory 
management sequence. 

12.5.1.5.4 Cross Page Sequence 

When an unaligned virtual reference falls across a page boundary, ACV/TNV/M=0 checks must 
be performed on both pages before the Mbox can determine if the reference passes or fails ACV 
checks. The function of the cross-page sequence is to generate an MME_CHK reference to check 
the second page (i.e. the upper page) for ACV7TNV7M=0 problems. As long as the MME_CHK 
clears memory management checks before the reference is allowed to execute, the reference can 
be processed in the normal manner because ACV/TNV/M=0 checks on the first page (i.e. the 
lower page) will naturally occur as they do on all virtual references. If an ACV7TNV problem is 
found on either page, an ACV/TNV condition is flagged for the reference. 

When the cross-page detection logic flags a cross-page condition, the following cross-page sequence 
is invoked: 

• cycle 1: The cross-page condition is detected. The S5 reference is aborted. The MME_ADDR 
latches the M_QUE%S5_VAJB address. 

• cycle 2: The MMEJDATAPATH adds 512 to the address in MME_ADDR. The resulting 
address is guaranteed to fall into the upper page of the original reference for all byte, word, 
longword and quadword references. This address is loaded into the MMEJLATCH qualified 
by an MME_CHK command. The MME_CHK reference (with DL=byte) will perform memory 
management checks on the upper page. 

• cycle 3: The MME_CHK is executed in S5 (assuming no Cbox reference took priority). If 
a TB_MISS occurs, the TB_MISS sequence is first invoked to obtain the proper translation. 
Once the TB has been updated based on the TB_MISS, the original MME_CHK reference will 
be restarted and the cross-page sequence will be re-invoked from the beginning. 

When the translation of the MME_CHK reference has properly occurred, ACV/M=0 checks are 
performed (note that TNV checks are only performed when the PTE is to be filled in TB). If an 
ACV/TNV/M=0 fault is detected during the MME.CHK processing, M_QUE%S5_QUAL_H<0> of 
the original reference, which caused the cross-page sequence, is set. Thus, when this reference 
is restarted, an MME fault will be reported. If no ACV/TNV7M=0 condition was detected on 
the upper page, the original reference is marked as having passed the cross-page condition 
(M_QUE%S5_QUAL_H<5> is set). 

• cycle x: The original reference is restarted. If no ACV /TNV /M=0 fault occurred on the upper 
page the reference executes normally without further cross-page checks. 

If the reference was marked as having an MME fault, the reference fault will be reported in 
the previously-described fashion (see Section 12.5.1.5.3.7). 

The cross-page sequence is only invoked on a virtual reference when memory management is 
enabled. 
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12.6 MBOX ERROR HANDLING 

12.6.1 Types of Errors Handled 

Mbox plays a role in the processing of the following types of errors: 

• TB tag parity errors. 

• TB data parity errors. 

• Pcache tag parity errors. 

• Pcache data parity errors. 

• Errors encountered by the Cbox while processing a memory read, I/O space read, or IPR_RD 
which were transferred from the Mbox to the Cbox. Note that these errors could originate 
from the Bcache, NDAL or memory subsystem. 

All other possible errors are handled without Mbox involvement. 

12.6.2 TB parity error detection 

12.6.2.1 TB tag parity error detection 

Conceptually, a single bit of even parity representing TB tag parity is stored in each TB entry. 
Whenever a valid tag entry matches the S5 virtual page address, the corresponding tag parity 
data is accessed and driven out of the TB array for a subsequent parity check. Thus tag parity 
errors are only detected on the entry which causes a TB hit condition. 

The value of tag parity with which the stored parity data is compared to is calculated in parallel 
with the TB access by using the virtual page address found on M_QUE%S5_VA U _H<3 1 &> . This 
scheme eliminates the need to drive out the matched tag entry in order to calculate parity. 
If the tag matched the virtual page address, then the correct parity value can be derived from 
M_QUE%S5_VAJH<31:9> instead of from the stored tag. This scheme is called predicted parity. 

Tag parity in a fully associative cache can cause several different failure modes since the tag 
state directly determines which entry (or entries) are selected during each TB access. Assuming 
a single bit soft failure occurs in a single TB tag (i.e. a tag bit accidentally toggles due to some 
transient failure mode), three possible failure modes are possible: 

1. A single bit tag error can cause no TB entry to match because the tag no longer compares 
with the virtual page address that it should have compared to. Thus, a TB_MISS condition 
is generated which causes the PTE data to be accessed from memory. This PTE data, along 
with its corresponding tag, will be written into a TB entry. In effect, this scenario causes the 
single bit tag error to remain undetected, but does not corrupt the virtual address translation 
process. 

2. A single bit tag error may cause exactly one TB entry to match because the incorrect tag 
entry happens to match a virtual page address which is not already cached in the TB. In 
this situation, the tag parity read out of the TB is guaranteed not to match the virtual page 
address parity. Thus a TB tag parity error will be correctly detected. 

3. A single bit tag error may cause two TB entries to match because the incorrect tag entry 
happens to match a virtual page address which is already cached in the TB. Thus, the correct 
tag entry detects a match at the same time as the incorrect tag entry detects a match. 
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Due to the wired OR function implicit in accessing data off of a shared bit line within the TB 
array, it is possible that the tag parity read out of the array matches the parity of the virtual 
page address causing no tag parity error to be detected. In this case, the wired OR function on 
the PTE bit lines will OR the two accessed PTE entries together causing an incorrect PTE to 
be read out. If an even number of PTE bits were corrupted by the simultaneous PTE access, 
the parity logic associated with the PTE data will not detect a problem. This is a disatrous 
situation to the currently-executing CPU process because the TB will produce an incorrect 
translation without producing a parity error. 

As a result of the undetected fatal parity error discussed in this third case, a single bit of tag 
parity is stored in both its true and complement form in each TB entry. For a single entry 
match, these two parity lines always produce a "01" or "10" value. Due to the wired OR access, a 
two-entry TB match due to a single bit tag parity error, produces a "11" parity access indicating 
a multiple tag match and a tag parity error. 

TB tag parity is written along with the tag during a TB_TAG_FILL operation. 

12.6.2.2 TB data parity error detection 

Data parity error detection is conceptually simpler than tag parity detection. When a TB hit 
condition occurs the accessed PTE data is driven out of the TB along with the corresponding 
stored data parity. Parity is then calculated on the data and compared with the stored parity. A 
miscompare results in a TB data parity error. TB data parity is a single bit corresponding to the 
entire stored PTE field. 

TB data parity is written along with the PTE data during a TB_PTE_FILL operation. 

12.6.3 Pcache parity error detection 

12.6.3.1 Pcache tag parity error detection 

Pcache tag parity is stored and checked as a single bit representing even parity across the entire 
20-bit tag field. Unlike the TB implementation however, true and complement versions of single 
bit tag parity are not implemented— only the true version is implemented. 

There are two separate aspects to Pcache tag parity error detection. The first aspect employs 
the "predicted parity" scheme which was used for the TB. However, the Pcache does not use 
predicted parity to directly detect tag parity errors. Instead, predicted tag parity is factored into 
the Pcache hit logic such that a Pcache miss will be forced if the tag parity does not agree with 
the parity calculated on the input address. By doing so, the tag parity design does not have to 
handle the case of a Pcache hit causing data to be returned to the Ibox, Ebox or Mbox in the 
presence of a Pcache tag parity error. Pcache predicted tag parity works by generating parity 
on M%S6_PA_H<31:12> at the same time as the Pcache access is taking place. If a validated 
tag matches the address on M%S6JPA_H<31:12>, but the tag parity does not match the predicted 
parity, a Pcache miss is forced. 

The second aspect of Pcache tag parity error detection explicitly detects the tag error condition 
after the Pcache access has completed. Both banks of the tag store have their own tag parity 
generator. When both tags of the addressed Pcache index are driven out of the tag store, the two 
parity generators calculate tag parity based on the two accessed tags. These calculated values are 
compared to the corresponding stored tag parity which was accessed from the tag store with the 
| tag data. If a miscompare occurs, a tag parity error is flagged. Note that this mechanism allows 
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miscomparing tags to be nagged as tag parity errors while the other tag may simultaneously 
generate a Pcache hit or miss. 

Pcache tag parity is checked on both tags on all Pcache I-stream read operations only when 
I_ENABLE=1 and FORCE_HIT=0 in the PCCTL. Pcache tag parity is checked on both tags on 
all Pcache D- stream read and write operations only when D_ENABLE=1 and FORCE_HIT=0 
in the PCCTL. When FORCE_EEIT=l, tag parity is never checked. Pcache tag parity is never 
checked on an IPR_RD operation to a Pcache tag. Tag parity is written on a cache fill operation 
or on an IPR_WR to a Pcache tag. 

12.6.3.2 Pcache data parity error detection 

Byte parity is maintained for each Pcache hexaword block. Therefore, each block contains 32 bits 
of parity-one bit of even parity for each byte of data. 

Pcache data parity is checked on the same conditions as Pcache tag parity checks except for two 
differences: 

1. Unlike tag parity, Pcache data parity errors are only detected during a Pcache hit condition. 
One exception to this rules exists though. If the Pcache force hit condition exists due to a 
memory management fault or hard fault, then Pcache data parity is not checked in spite of 
the Pcache hit condition. 

2. Unlike tag parity, data parity is written into the array during a Pcache write operation rather 
than checked. M%S6_BYTE_MASK k _H<7K)> enables writing data parity into the Pcache in the 
same manner as M%S6_BYTE_MASK_H<7:0> enables writing data into the Pcache. Therefore, 
each data parity bit is only updated as its corresponding byte of data is updated in the Pcache 
array. 

The Pcache data parity check begins following the completion of the Pcache read access. Correct 
parity is generated on all eight data bytes read out of the Pcache. Each bit of generated data 
parity is compared to its corresponding stored parity. If one or more mismatches is found, a 
Pcache data parity error has occurred. Note that the parity check is independent of which bytes 
of the eight accessed bytes were actually requested by the read reference. Therefore, a Pcache 
data parity error can occur even though the requested bytes of data have correct parity. 

12.6.4 Recording Mbox errors 

When any hard error is detected within the system, the error is recorded in one of many error 
status registers located throughout the NVAX system. When the operating system error handler 
routine is invoked from a microtrap or interrupt, the handler can read the state of all the error 
registers through IPR_RD operations to determine what error or errors were present when the 
error handler was invoked. 

The Mbox contains four of these error registers. Two are used to record TB parity errors and the 
other two are used to record Pcache parity errors. 
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1 2.6.4.1 TBSTS and TB ADR 

The TB status register is shown below: 

Figure 12-56: I PR ED (hex), TBSTS 
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Table 12-23: TBSTS Field Descriptions 



Name 



Extent Type Description 



LOCK 



DPERR 
TPERR 
EM VAL 



CMD 
SRC 



0 



8:4 
31:29 



WC 



RO 
RO 
RO 



RO 
RO 



Lock Bit. When set, validates TBSTS contents and prevents any 
other field from further modification. When clear, indicates that no 
TB parity error has been recorded and allows TBSTS and TBADR 
to be updated. 

Data Error Bit. When set, indicates a TB data parity error. 

Tag Error Bit. When set, indicates a TB tag parity error. 

EMJLATCH valid bit. Indicates if EMJLATCH was valid at the time 
of the error TB parity error detection. This helps the software error 
handler determine if a write operation may have been lost due to 
the TB parity error. 

S5 command corresponding to TB parity error. 

Indicates the original source of the reference causing TB parity error. 



Table 12-24: SRC Encodings 



Defined SRC values 


Definition 


110 


valid IREAD error is stored 


100 


valid Ibox specifier reference error is stored 


000 


valid Ebox reference error is stored 



See Figure 12-27 for the format description of TBADR. 

When a TB parity error is detected with LOCK=0, TBADR is loaded with the virtual address 
which caused the TB parity error, and all fields of TBSTS are updated to record the nature of 
the TB parity error. Note that both the TPERR and DPERR bits can be set at the same time if 
these two error conditions occurred during the same cycle. When a TB parity error is recorded, 
the LOCK bit is set to validate the contents of both TBSTS and TBADR registers. When LOCK 
is set, all bits of both registers are frozen and cannot be changed until the LOCK bit is cleared. 
Thus, any subsequent error is not recorded if LOCK=l. 
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When the operating system error handler is invoked, TBSTS and TBADR will be read through an 
IPR_RD command in order to determine if any TB parity errors were recorded. If the state of the 
LOCK bit was read to be a zero, then no error has occurred and the remaining state information 
in these two registers is invalid. If the LOCK bit was found to be set, then the remaining error 
state of these two registers characterizes the nature of the recorded error. 

Once the error handler has read these registers, it re-enables TBSTS to record any new errors by 
clearing the LOCK bit. Clearing the LOCK bit is accomplished by writing a "1" to LOCK through 
an IPR_WR operation. 

12.6.4.2 PCSTS and PCADR 

The PCSTS register is shown below: 



Figure 12-57: IPR F4 (hex), PCSTS 
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Table 12-25: PCSTS Field Descriptions 



Name 


Extent 


Type 


Description 


LOCK 


0 


WC 


Lode Bit. When set, validates PCSTS<8:1> contents and prevents 
modification of these fields. When clear, invalidates PCSTS<8:1> 
and allows these fields and PCADR to be updated. 


DPERR 


1 


RO 


Data Error Bit. When set, indicates a Pcache data parity error. 


RIGHTJBANK 


2 


RO 


Right Bank Tag Error Bit. When set, indicates a Pcache tag parity 
error on the right bank. 


LEFT.BANK 


3 


RO 


Left Bank Tag Error Bit. When set, indicates a Pcache tag parity 
error on the left bank. 


CMD 


8:4 


RO 


S6 command corresponding to Pcache parity error. 


PTE_ER_WR 


9 


WC 


Indicates a hard error on a PTE DREAD which resulted from a TB 
miss on a WRITE. 


PTEJER 


10 


WC 


Indicates a hard error on a PTE DREAD. 



The PCSTS and PCADR record Pcache tag and data parity errors. The function and operation 
of these registers is identical to the TBSTS and TBADR registers except that the PCADR stores 
physical quadword addresses rather than virtual byte addresses, and it also records PTE hard 
error events. The definitions of these registers are shown in Figure 12-29 and Figure 12-30. 
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Note however, that when PCSTS<0> is set, Pcache memory reads, writes and invalidates are 
disabled. 

The PCSTS is a partial misnomer in that it also records hard error state associated with fatal 
errors occurring on Mbox PTE DREAD references. These hard errors have nothing to do with 
Pcache parity errors, however, they are included in PCSTS for implementation simplicity. 

The PTEJER bit of PCSTS will set whenever the Cbox has returned fatal error status on a 
requested PTE DREAD. The PTE_ER_WR bit of PCSTS will set whenever the Cbox has returned 
fatal error status on a requested PTE DREAD which was due to a TB miss on a WRITE reference. 
Both of these bits may be set independently of the LOCK bit of PCSTS. Further, the state of these 
bits are always valid regardless of the state of the LOCK bit. These two bits can only be cleared 
by a write-one- to-clear operation to each bit. 

12.6.5 Mbox Error Processing 

12.6.5.1 Processing TB parity errors 

TB tag parity errors can be detected on all commands which cause a TB tag lookup to occur (See 
Section 12.6.5.4). TB data parity errors can be detected on all commands in which data can be 
read out of the TB (See Section 12.6.5.4). 

For hardware simplicity, the detection of any TB parity error will cause the Mbox to generate a 
hard error microtrap and will cause the faulting reference and all pending Ibox, Ebox and Mbox 
references to be cleared. Thus, any TB parity error is fatal in the sense that it is non-recoverable 
and will cause a machine check. 

The following describes the specific sequence of events which occur following the detection of a 
TB tag parity error, or a TB data parity error: 

1. If the TBSTS register is locked, TBSTS state is not updated. Assuming the TBSTS is not 
locked, the TB parity condition is recorded in the TBSTS and the associated virtual address 
is loaded into TBADR. TBSTS and TBADR are subsequently locked by setting TBSTS<0>. 

The Mbox asserts M%TB_PERR_TRAP_L to invoke a hard error microtrap. 

The valid bits of the IREFJLATCH, SPEC.QUEUE, EM.LATCH, VAPJLATCH, and 
RTY_DMISS__LATCH are unconditionally cleared to eliminate all pending references which 
might involve a subsequent TB operation. 

2. The TB parity error detection causes the MME_DATAPATH to invoke the TB parity error 
sequence. As a result, the MMEJDATAPATH issues a TBIA command. 

The reference which caused the TB parity error is transformed into a NOP command as it 
propagates into the S6 pipe. Thus, this reference will not modify any Pcache, Bcache or Cbox 
state. 

3. The TBIA command executes in S5 causing all TB entries to be invalidated and for the NLU 
pointer to be reset. All TB entries are invalidated rather than just the one which caused the 
parity error. This is done based on the premise that a single soft failure in the TB may affect 
more than one entry. Thus, each distinct soft failure will only be detected and reported once. 
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12.6.5.2 Processing Pcache parity errors 

Pcache tag parity errors can be detected on all commands which cause a Pcache tag lookup to 
occur (See Section 12.6.5.4). Pcache data parity errors can be detected on all commands in which 
data is read out of the Pcache (See Section 12.6.5.4). 

The strategy behind processing Pcache parity errors is to turn off the Pcache and let the Cbox 
process the reference from the Bcache or from main memory. Thus, in the absence of any of 
errors from the Cbox or memory subsystem, a Pcache parity error never causes an error fatal to 
the currently executing process. 

The following describes the specific sequence of events which occur following the detection of a 
PCACHE tag parity error: 

1. The Pcache tag parity error is recorded in it and the corresponding physical address is 
recorded in PCADR. PC ADR and PCSTS are subsequently locked by setting the LOCK bit of 
PCSTS. Locking PCSTS automatically disables the Pcache from performing any subsequent 
non-IPR operations. 

The Mbox asserts M%MBOXJ3_ERROR_H to flag an interrupt which will guarantee that the 
parity error will be recorded as a soft error at some future time. 

If the Pcache operation is a write, the Cbox will automatically continue processing the 
reference independent of any parity error condition. In the case of read operations, the 
predicted parity mechanism guarantees that a Pcache miss condition will occur when a tag 
parity error is detected. Thus, M%CBOX_REF_ENABLE_L is asserted in response to the Pcache 
miss condition causing the Cbox to continue to process the read reference. 

The following describes the specific sequence of events which occur following the detection of a 
PCACHE data parity error: 

1. The Pcache data parity error is recorded in it and the corresponding physical address is 
recorded in PCADR. PCADR and PCSTS are subsequently locked by setting the LOCK bit of 
PCSTS. Locking PCSTS automatically disables the Pcache from performing any subsequent 
non-IPR operations. 

The Mbox asserts M%MBOXJ3_ERROR_H to flag an interrupt which will guarantee that the 
parity error will be recorded as a soft error at some future time. 

If the Pcache operation was a read in the absence of an outstanding fill operation, then 
M%CBOX_LATE_EN_H is asserted to inform the Cbox that it must continue to process the S6 
reference because of the Pcache data parity error. M%CBOXJLATE_EN_H may be asserted in 
spite of the fact that M%CBOX_REF_ENABLE_L was deasserted earlier in the cycle because 
M%CBOX_BEF_ENABLE_L is dependent on the Pcache hit condition but not on the parity error 
detection. The Pcache read reference is loaded into the corresponding MISS_LATCH and the 
read is treated in subsequent cycles as a normal Pcache miss sequence. 

If the Pcache operation was a D- stream read which occurred during an outstanding fill 
operation, M%CBOX_LATE_EN_H is not asserted because the Mbox and Cbox are unable to 
handle another fill at this point. When the the fill sequence completes, this reference will be 
retried (from the RTY_DMISS_LATCH), and M%CBOX_LATE_EN_H will be issued. 

Note that M%CBOX_LATE_EN_H is never asserted during a Pcache write operation. 
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12.6.5.3 Processing Cbox errors on Mbox-lnitiated read-like sequences 

The Cbox detects errors that occur in the Bcache, NDAL or memory subsystem. When the Cbox 
detects one of these errors, and it is associated with an Mbox-initiated reference that requires 
data to be returned (e.g. memory read, I/O space read, or IPR read), the Mbox must transfer the 
error status of the reference back to the destination corresponding to the reference. The Mbox 
never records a Cbox-detected error in Mbox error registers because the error is logged in Cbox 
error registers. 

12.6.5.3.1 Cbox-detected ECC errors 

The Cbox returns requested data through a I_CF or D_CF command to the Mbox while 
simultaneously checking the error-correction code to check for a possible Bcache error. If an ECC 
error is found, the Cbox asserts C9^BOX_ECCJEBJR_H. This causes the Mbox to latch a NOP in 
the CBOXJLATCH rather than the cache fill. As a result, the Mbox does not perform any Pcache 
state updates resulting from the bad data nor does it assert M^VIC.DATA^L, M%IBOX u .DArA_L, 
M%EBOXJDA£AJB, or M%MBOX^DATA to indicate the presence of valid data. 

During subsequent cycles, the Cbox will determine if the ECC error is correctable or not. If it 
is, the data will be corrected and returned. If the data is not correctable, a Cbox-detected hard 
error has occurred and will be dealt with as described below. 

Note that the ECC detection mechanism is what verifies the validity of the data. The Cbox does 
not send any parity information in order for the Mbox to check the validity of the received data. 

12.6.5.3.2 Cbox-detected hard errors on requested fill data 

If the Cbox has determined that the requested data cannot be returned for some reason, the 
Cbox drives a cache fill command qualified by C%CBOX w ELARD_ERR_H. When this happens, the 
Mbox performs the following actions: 

1. The assertion of C%CBOSLHARD_ERR_H indicates to the Mbox that the cache fill data is invalid. 
Thus, the Mbox returns the invalid data on the M%MD_BUS_H in the same manner that all 
data is returned except that the data is further qualified by M%HARD_ERR_H . M%HARD_ERR_H 
informs the receiver that the data is invalid and that the requested data cannot be returned 
due to a hard error. 

2. Once the Cbox detects a hard error on the requested data, the Cbox immediately terminates 
the pending fill sequence by the assertion of C%LAST_FTTJ,_H. Thus, no further data 
corresponding to the same fill sequence will be returned and the Mbox fill sequence 
corresponding to the error is terminated by invalidating the corresponding MISS_LATCH. 

3. An I_CF or D_CF command which is qualified by C%CBOX_HARD_ERR_H is interpreted by the 
Pcache as an INVAL command. Thus the invalid data is not filled in the Pcache. 
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12.6.5.3.3 Cbox-detected hard errors on non-requested fill data 

The Cbox performs the same actions as described above to indicate the presence of a hard error 
regardless of whether the data is the requested data or just one of the other three pieces of fill 
data for the corresponding Pcache block. If the data is non-requested fill data, the Mbox performs 
the following actions: 

1. Once the Cbox detects a hard error on the non-requested data, the Cbox immediately 
terminates the pending fill sequence by the assertion of C%LAST_FELL_H . Thus, no further 
data corresponding to the same fill sequence will be returned and the Mbox fill sequence 
corresponding to the error is terminated by invalidating the corresponding MISS_LATCH. 

2. An I_CF or D_CF command which is qualified by C%CBOX_HARD_ERR_H is interpreted by the 
Pcache as an INVAL command. Thus the invalid fill data is not filled in the Pcache and 
all previous fills to the same block are invalidated. This is necessary in order to maintain 
coherency between the Pcache and Bcache because a Bcache data block will only be validated 
if all the data within the block is error-free. 

12.6.5.3.4 Microcode Invocation on Cbox-detected Hard Errors 

When the Cbox indicates a hard error on requested read data, invalid data is driven on the 
M%MD_BUS_H qualified by M%HARD_ERR_H to indicate that the data is invalid due to a hard error. 
When the Ebox references the corresponding data a microtrap is generated by the Ebox to invoke 
the hard error microflow. 

If the hard error occurs on the address of the address of an operand (i.e. Ibox decoding a deferred 
specifier), the Mbox returns data qualified by M%HARD_ERR_H in the normal manner. However, 
in some instances, the Ibox must issue a second reference to the Mbox based on the address 
returned by the first reference. Due to the hard error however, the Ibox cannot issue a valid 
operand read since the data returned by the first reference was invalid. In this case, the Ibox 
issues a read qualified with the I%FORCE_HARD_FAULT_H signal. 

If this deferred specifier is a source operand, the Mbox "fakes" a hard error on this read by forcing 
a Pcache hit and by qualifying the returned data with M%HARD_ERR_H. This reference is trapped 
on when the Ebox references the operand. 

If this deferred specifier is a destination specifier, the Mbox sets the corresponding hard error 
bit in the the PA_QUEUE. The hard error condition is then propagated to the Ebox through 
M%PA_Q_STATUS_H<2>. 

If a hard error is generated on an Mbox PTE reference, this fact is recorded in the PCSTS 
register (see Section 12.6.4.2), the tb.miss sequence is immediately terminated, and the original 
reference (i.e. the reference that caused the memory management sequence which generated the 
PTE reference) is tagged as having the hard error associated with it. 

When the original reference is retried after the memory management sequence completes, the 
reference will be treated as if the hard error actually occurred on it. 

If the original reference was a read from the Ibox, the Mbox asserts M%HARD_ERR_H as it returns 
the invalid data to notify the Ibox or Ebox of the problem. The error handler will be invoked by 
the Ebox once the Ebox references the invalid data. The error handler will then read all error 
registers in the system to determine the nature of the error (note that the Cbox has recorded the 
physical PTE address of the fatal read). 
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Hard errors on PTE DREADs resulting from a TB miss on a DEST_ADDR get reported through 
the M%PA_Q_STATUS_H<2> mechanism described above. 

Thus, any hard error on a PTE reference invoked by an Ibox reference will always be reported 
within the context of the executing instruction. However, fatal errors on PTE DREADs resulting 
from MME_CHK and WRITE references pose a more difficult problem than PTE errors resulting 
from reads. Since both of these references do not cause the Ebox to wait for a response from the 
Mbox, a more involved sequence is implemented in order to maximize the ability to report the 
fatal error within the context of the corresponding instruction execution. 

Thus, when a PTE error is detected on ANY Ebox reference except for PROBEs, the following 
sequence will take place: 

1. The Mbox will immediately assert M%MME_TRAP_L (unless the Ebox has previously asserted 
E%FLUSHJMBOXJB during the tb miss sequence). 

The MME sequencer will update MMEADR to record the original address of the reference 
which resulted in the tb miss sequence— it does not record the PTE address. The MME 
sequencer will update MMESTS<2> to indicate whether the original address had modify 
intent. The FAULT, PTEJREF, and LV fields of MMESTS are UNPREDICTABLE in this 
context. 

2. The assertion of M9SMME_TRAP_L will cause the Ebox to immediately trap to the mme 
microflow. 

3. The mme microflow will examine MMESTS<2> and issue a PROBE command to the address 
in MMEADR to determine to nature of the mme fault. 

4. The PROBE will invoke another TB miss. If the PTE error does not reoccur, valid PROBE 
status will be returned to the Ebox indicating the absence or presence of a true mme fault. 
In this case, Ebox processing of the current instruction will continue with no consequences 
due to the transient hard error. 

If the PTE error does reoccur on the TB miss during PROBE processing, the PROBE status 
returned to the Ebox will be qualified with M%HARD_ERR_H indicating that a fatal error 
occurred during the PROBE reference. This will invoke the error handler within the context 
of the executing instruction. 

12.6.5.4 Mbox Error Processing Matrix 

The following table summaries all Mbox error handling. A blank entry in the table means that 
the corresponding error cannot occur for the given reference. 

Table 12-26: Mbox Error Handling Matrix 

P cache 

TB tag TB data Pcache tag data parity Cbox hard 
Command parity error parity error parity error error error 

Ibox references 

IREAD A A B D F 
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Table 12-26 (Cont): Mbox Error Handling Matrix 



Command 



TB tag 
parity error 



TB data 
parity error 



Pcache tag 
parity error 



Pcache 
data parity 
error 



Cbox hard 
error 



DREAD A 

DREAD.MODIFY A 

DESTADDR A 
STOP_SPEC_Q 



A 
A 
A 



D 
D 



Ebox references 



DREAD A 
DREADJLOCK A 
STORE 

WRITE A 

WRTTE.UNLOCK A 

IPR_RD (to Pcache) 

IPR_RD (non-Mbox) 

IPR_WR (to Pcache) 

IPR_WR 
(non-Mbox) 

PROBE A 

MME.CHK A 

TB_TAG_FILL 

TB_PTE_FILL 

TBIS 

TBIP 

TBIA 

LOAD_PC 



A 
A 

A 
A 



A 
A 



F 



Mbox references 

PTE DREAD A A B D G 

TB_TAG_FILL 

TB_PTE_FILL A 

IPR_DATA 
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Table 12-26 (Cont.): Mbox Error Handling Matrix 

Pcache 

TB tag TB data Pcache tag data parity Cbox hard 
Command parity error parity error parity error error error 



MMECHK A A 

Cbox references 

INVAL E 

D_CF H 
I_CF H 



LEGEND: 
A. 

• Mbox microtraps Ebox by assertion of M%TB_PERR_TRAP_L during cycle error was 
detected. 

• The faulting reference and all pending Ibox and Ebox references are blown away. 

• TBIA command is issued to invalidate entire TB. 

• TBSTS and TBADR are updated appropriately. 

B. 

• A Pcache miss condition is forced to occur on this read reference causing the assertion 
of M%CBOX_REF_ENABLE_L. This instructs the Cbox to continue processing the read 
reference. 

• M%MBOX_S_ERROR_H is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

C. 

• The Cbox continues to process the write reference, as is done on all write operations 
regardless of a Pcache parity error. 

• M%MBOX u .S_ERROR_H is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

D. 

• M%CBOX u .LATE_EN_H is asserted to instruct the Cbox to continue processing the reference 
which caused the Pcache parity error. 

• M%MBOX w S_ERROR_H is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 
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E. 

• The invalidate operation takes place in spite of the tag parity error because the invalidate 
is only a function of matching all tag bits. 

• M%MBOXJ3_ERROR_H is asserted to post a soft error interrupt. 

• PCSTS and PCADR are updated appropriately (a side effect of this operation turns off 
the Pcache). 

F. 

• The Cbox indicated a hard error for a non-PTE read or IPR_RD operation by the assertion 
of C%CBOX_HARD_ERR_H and C%LAST_FELL_H . 

• If the hard error corresponded to the data explicitly requested by the Mbox reference, 
M%HABJD_ERR_H qualifies M%MD_BUS_H data indicating to the M%MD_BUS_H receiver that 
a hard error occurred while accessing the requested data. 

• The nil sequence is immediately terminated by the assertion of C%LAST_FELL_H. and the 
entire Pcache block corresponding to the fill is invalidated. 

G. 

• The hard error detected by the Cbox on this Mbox-issued PTE DREAD is recorded in 
PCSTS. The tb miss sequence is immediately terminated. 

IF the error resulted from an Ibox reference, the error is tagged back to the appropriate 
Ibox reference latch. The error is then signaled via M%HARD_ERR_H when the 
requested data is returned on M%MD_BUS_H, or is reported through PA_Q_STATUS<2> (for 
DEST.ADDR commands). 

If the original reference came from the Ebox, M%MME_TRAP_L is asserted (in all cases 
except for PROBE references). This will invoke the memory management fault handler 
in order to try to report the hard error within the context of the execution of the instruction 
(see Section 12.6.5.3.4 for more information). 

• The fill sequence is immediately terminated by the assertion of C%LAOT_ETLLJB. and the 
entire Pcache block corresponding to the fill is invalidated. 

H. C%CBOXJHARD_ERR_H was asserted by the Cbox during an I_CF or D_CF command. This is 
the mechanism by which the Cbox informs the Mbox of a hard error during a read or IPR_RD 
operation where the Cbox must return data. Thus, see the error responses specified by F and 
G for the error response within context of the original read operation. 
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12.7 MBOX INTERFACES 

The Mbox passes data and/or control information to four other sections of the NVAX chip. These 
sections are: 1) Ibox, 2) Ebox, 3) Useq and 4) Cbox. This section will describe the interfaces to 
each of these sections. 

12.7.1 IBOX INTERFACE 

12.7.1.1 Signals from Ibox 

• I%IBOX W CJMD_L<4,1:0>: Command field of reference sent by Ibox. 

• I%lBOX t _ADDR_H<31:0>: Transfers addresses of Ibox references to Mbox. 

• I%moXLTAG_L<2:0>: Ebox reg file destination of reference sent by Ibox. 

• I%E8OX tJ AT_L<l:0>: Access type of reference sent by Ibox. 

• I%IBOXJML_L<1:0>: Data length of reference sent by Ibox. 

• I%IBO^REF_DEST_L<1:0>: Indicates the destination(s) of the requested Ibox reference. 

• I%DEtEF_REQJB: When asserted, indicates that a valid IREAD reference is present on the 
I%IBOX_ J ADDR_H<31:0> bus. 

• I%SPEC_REQ_H: When asserted, indicates that a valid specifier reference is being issued to 
the Mbox. 

• I%FORCE JMME_FAULT_H : Indicates that the associated Ibox reference should be forced to "look" 
like a memory management fault from the Ibox point of view. 

• I%PORCE_HARD_FAULT_H: Indicates that the associated Ibox reference should be forced to 
"look" like a hardware fault from the Ibox point of view. 

• I%FLUSH_IREP_LAT_H: Indicates that any current IREAD sequence in Mbox should be 
immediately cleared. 

12.7.1.2 Signals to Ibox 

• M%SPECJJ_FULLJB: Informs Ibox that the SPEC_QUEUE is full and cannot accept any new 
references. 

• M%LAST_FELLJB: Qualifies I_CF data being returned to Ibox. It indicates that this data is the 
last fill data for the current fill sequence. 

• M%MD_BUS_H<63:0>: Transfers data back to Ibox. 

• M%MDJBUS_QW_PARITYL: Quadword parity for M%MD_BUS_H. 

• M%QWJUJGNMENT_.H<1:0>: Indicates the relative aligned quadword position of VIC fill data 
within the aligned hexaword. 

• M%VICJDATAJL: When asserted, indicates that M%MD_BUS_H<63:0> contains VIC fill data. 

• M%IBOXJDAIA__L: When asserted, indicates that M%MD_BUS_H<31:0> contains requested Ibox 
data. 

• M%ffiOXL.IPR_.WRJB: When asserted, indicates that M%MD_BUSJB<31:0> contains Ibox IPR 
write data. 
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• M%MME_FAULT_H: When asserted in conjunction with M%VIC_DAIA U L or M%IBOX_DATA w L, 
indicates that data on M%MD_BUS_H is invalid and that the corresponding reference was 
associated with a memory management exception. 

• M%HARD_ERR_H : When asserted in conjunction with M%VIC_DATA^L or M%IBOX_DAIA_L, 
indicates that data on M%MD_BUS_H is invalid and that the corresponding reference was 
associated with a hard error condition. 

12.7.2 EBOX INTERFACE 

12.7.2.1 Signals from Ebox 

• E%EBOX_CMDJB<4:0>: Command field of reference sent by Ebox. 

• E%VA_BUSJL<31:0>: Transfers addresses of Ebox references to Mbox. 

• E%WBUS_H<31:0>: Transfers data of Ebox references to Mbox. 

• E%EBOX_TAG_H<4:0>: Ebox reg file destination of reference sent by Ebox. 

• E%EBO3LAT_H<l:0>: Access type of reference sent by Ebox. 

• E%EBOX_DL_H<1:0>: Data length of reference sent by Ebox. 

• E%EBOX_VTRT_ADDR_H: Indicates whether address is virtual or physical. 

• E%MMGT_MODE_H<1:0>: Execution mode to be used for ACV checks on PROBE references. 

• E%CUB._MODE_H<1:0>: Execution mode to be used for ACV checks on all non-PROBE 
references. 

• E%EREF_REQ_H: When asserted, indicates that a valid Ebox reference is currently being 
issued. 

• E%EM_ABORT_L: Indicates that the current EM_LATCH reference should be disregarded. 

• E%FLUSHJMBOX_H: Indicates that certain references and reference state in the Mbox should 
be cleared (See Section 12.3.21.2 ). 

• E%FLUSH_PA_QUEUE_H: Indicates that the PA_QUEUE should be flushed (See 
Section 12.3.21.2 ). 

• E%START_EBOX_IO_RD_H: Indicates that the Ebox is md stalling on the corresponding 
SPEC_QUEUE read. If this SPEC.QUEUE read is an I/O space read and 
E%START_IBOX_IO_RD_H is not asserted, the read is aborted until it is asserted. 

• E%RESTART_SPEC_QUEUE_H: Indicates that Ebox has sent all explicit writes for the current 
instruction to the Mbox and, therefore, causes the SPEC_Q_SYNC_CTR to be incremented. 

• E%NOJHME_CHECKJH: Indicates that the corresponding EMJLATCH reference should not be 
tested for ACV or M=0 conditions. 

12.7.2.2 Signals to Ebox 

• M%EM_LAT_FUIX_H: Indicates that EM_LATCH is currently full and cannot accept any new 
references. 

• M%PA t _Q_STA!TUS_H<2>: indicates that the corresponding address in the PA_QUEUE is 
associated with a hard error. 

• M%PA_Q_STATUS_H<1>: indicates that the corresponding address in the PA_QUEUE is 
associated with a memory management exception. 
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• M%EAJi_STATUS_.H<0>: indicates that sufficient physical address data is present in the 
PA_QUEUE to initiate an Ebox STORE command. 

• M%MD_BUSJB<31:0>: Transfers data back to Ebox. 

• M%MD_TAG_H<4:0>: Ebox reg file destination of reference on M%MD_BUS_H<31:0>. 

• M%EB05LDATA_H: When asserted, indicates that M%MDJBUS_H<31:0> contains requested Ebox 
data. 

• M%MME_FAULT_H: When asserted in conjunction with M%EBOX_DAIA_H, indicates that data on 
M%MD_BUS_H is invalid and that the corresponding reference was associated with a memory 
management exception. 

• M%HARD_EBJR_H : When asserted in conjunction with M%EBOX_DAIAJI, indicates that data on 
M%MD_BUS_H is invahd and that the corresponding reference was associated with a hard error 
condition. 

• M%PMUXO_H: Mbox performance data signal (see Section 12.10). 

• M%PMUX1_H: Mbox performance data signal (see Section 12.10). 

12.7.3 INTERRUPT SECTION INTERFACE 
12.7.3.1 Signals to Interrupt Section 

• M%MBOXjS_ERROR_H: Indicates that the Mbox has logged a hard error in the PCSTS register 
and thus, is posting an interrupt. 

12.7.4 USEQ INTERFACE 
12.7.4.1 Signals to Useq 

• M%MME_TRAP_L: Indicates to the Useq that a memory management exception is to be invoked. 

• M%TB_PERR_TRAP_L: Indicates to the Useq that a tb parity error has been detected. 

12.7.5 CBOX INTERFACE 
12.7.5.1 Signals from Cbox 

• C%CBOX y .C2MD_H<l:0>: Command field of Cbox reference sent to Mbox. 

• c%CBOX^.ADDR_H<31:5>: Hexaword address of Cbox reference sent to Mbox. 

• C%MBOX l .FnJ^_QW_H<4:3>: Indicates the aligned quadword within the aligned hexaword. 

• C%REQ_DQWJE: Qualifies the current D_CF to indicate that this is the requested data. 

• B%S6_DAIA,_H<63:0>: Data of Mbox reference seen by Cbox. 

• C%S6_DP_H<7:0>: Even data parity corresponding to B%S6_DATA_H<63:0> during cache fill 
references. 

• C%LAST_FILL_H: When asserted, indicates that this is the last fill sent for the current 
sequence. 

• C%CBOX_HARD_ERR_H: When asserted when Cbox is driving data onto the B%S6JDATA^H Bus, 
it indicates that data on M%MDJBUS_H is associated with a non-recoverable hard error. 
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• C%CBOSLECC_ERR_H: Indicates that an ECC error is associated with the Cbox data being 
returned. 

• C%WR_BUF_BACK k _PRES_H: Indicates that Cbox cannot accept any more entries in its write 
buffer. 

12.7.5.2 Signals to Cbox 

• M%S6_CMD_H<4:0>: Command field of Mbox reference seen by Cbox. 

• M%S6_EA_H<31:3>: Quadword physical address of Mbox reference seen by Cbox. 

• M%C_S6_PA_H<2:0>: Address within addressed quadword of Mbox reference seen by Cbox. 

• B%S6_DAIA^H<63:0>: Data of Mbox reference seen by Cbox. 

• M%S6_BYTE_MASK W H<7 :0>: Byte mask field of Mbox reference seen by Cbox. 

• M%CBOX t _REF_ENABLE_L: Indicates that current S6 read reference packet should be latched 
and processed by the Cbox. This signal is a don't care on write operations. 

• M%CBO^_LATE_EN_H: Asserted at the end of a cycle to indicate that a Pcache parity error was 
detected. As a result, the Cbox must continue to process this reference regardless of what 
M%CB03LREF_ENABLE_L indicated. 

• 

• M%ABORT_CBOXJtttD_H: Indicates that any IREAD which the Cbox may be processing should 
be immediately terminated. 

• M%CBOX_BYPASS_ENABLE_H: Indicates that the Cbox may drive B%S6_DATA_H<63:0> during 
the following cycle in order to attempt a data bypass. 
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12.8 INITIALIZATION 

12.8.1 Power-up Initialization 

The signal, KJM%RESET_L is asserted during the power-up reset sequence. The following state is 
forced whenever KJtf%RESET_L is asserted: 

• EMJLATCH valid bit is cleared. 

• VAP.LATCH valid bit is cleared. 

• MMEJLATCH valid bit is cleared. 

• RTY_DMISS_LAT vaHd bit is cleared. 

• DMISS_LATCH vahd bit is cleared. 

• MME state machine is forced to the home state. 

• PCCTL<8:0> are cleared (this disables the Pcache). 

The power-up reset sequence also causes the assertion of E%FLUSH_MBOX. E%FLUSH _MBOX will 
cause the following state to be forced within the context of the power-up sequence; 

• The SPEC_QUEUE vahd bits are cleared. 

• The SPE C_Q_SYNC_CTR is reset to 0. Note that a subsequent E%RESTART_SPEC_Q signal is 
expected to enable SPEC_QUEUE arbitration. 

• MMESTS<31:29> are cleared. This invalidates and unlocks the MMESTS register. 

See Section Section 12.3.21.2 for a complete description of all state changes due to E%FLUSH_MBOX. 

Once E%FLUSHJMBOX has been asserted, E%FLUSH_PA^QUEUE will be asserted during a 
subsequent cycle. E'&FLUSH.PA^QUEUE will cause all PA_QUEUE vahd bits to be cleared. 

The power-up reset sequence also causes the assertion of I%FLUSH_IREF_LAT. I%FLUSH_IREF_LAT 
will cause the following state to be forced within the context of the power-up sequence: 

• The IREFJLATCH vahd bit is cleared. 

• The IMISS.LATCH vahd bit is cleared. 

See Section Section 12.3.21.1 for a complete description of all state changes due to I%FLUSH_IREF_LAT. 

12.8.2 Initialization by Microcode and Software 

It is the responsibility-of the power-up microcode to perform an IPR_WRITE operation to clear 
MAPEN before any virtual memory references are issued to the Mbox from either the Ebox or 
Ibox. Failure to clear MAPEN could result in UNDEFINED behavior prior to complete memory 
management state initialization. 

PAMODE is also cleared by the power-up microcode via an IPR_WRITE command. If the system 
configuration requires a 32 bit program-visible physical address space, setting the PAMODE value 
via an IPR_WRITE must be done under very controlled conditions because writes to the PAMODE 
processor register affect both physical address generation and interpretation of PTEs. With the 
possible exception of certain diagnostic code, writes to the PAMODE processor register should 
not be performed while memory management is enabled. With memory management disabled, 
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writes to the PAMODE processor register should not be performed unless the PC of the MTPR 
instruction which writes to the register is in one of the following (hex) address ranges: 

00000000. . 1.FFFFFF 
E0000000..FFFFFFFF 

By restricting PC to one of these address ranges, changes to the PAMODE register do not cause 
the generated physical address to change in going from 30-bit mode to 32-bit mode, or vice versa. 
At powerup, microcode fetches the initial instruction from the boot ROM at address E0040000 
(hex), which is in the second of the ranges shown above. Therefore, the console code in the boot 
ROM may write to the PAMODE processor register, and it is expected that this is the place where 
the PAMODE processor register will be initialized. 

In uncontrolled conditions, writes to the PAMODE processor register can cause UNDEFINED 
results. 

12.8.2.1 Pcache Initialization 

The Pcache is disabled by the power-up initialization sequence. In order to enable the Pcache, 
the following sequential actions must be performed: 

1. Pcache IPR_WRITE operations must be performed to each Pcache tag to write the tag field 
to a known state, set the tag parity bit to the corresponding value, and clear the subblock 
valid bits. 

2. The lock bit in PCSTS must be cleared so that a locked PCSTS will not inhibit turning on 
the Pcache. 

3. An IPR_WRITE to the PCCTL must be done to enable the Pcache in the desired operation 
mode. This step effectively turns the Pcache on. 

Note that the data array need not be initialized because correct parity will be written into the data 
array whenever fill data is validated, and data parity is only checked on validated sub-blocks. 

12.8.2.2 Memory Management Initialization 

Memory management is disabled by MAPEN being cleared by the power-up microcode. Before 
memory management can be turned on, the following actions must be performed: 

• The Ebox must issue a TBIA command to invalidate the TB and reset the NLU pointer to a 
known state. This is done as part of the microcode processing of an MTPR to MAPEN. 

• The Ebox must write the appropriate values into the six memory base and length registers 
via IPR_WRITE commands. 

Once this is done, the Ebox may turn on memory management by setting MAPEN through an 
IPR_WRITE command. 
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12.9 Mbox Testability Features 

This section describes what testability features are made use of for Mbox testability, and what 
Mbox signals are used for each testability function. For a global understanding of NVAX 
testability, and for a detailed description of each testability strategy and hardware mechanism, 
the reader is referred to Chapter 19. 



12.9.1 Internal Scan Register and Data Reducers 

The following lists Mbox signals which are captured in the internal scan chain. The signals are 
listed in the order in which they are serially shifted out. Therefore, the first signal listed is the 
first signal shifted out. If a bus of signals is listed in the form signal<x:y>, y represents the first 
hit to hft shifted ont; x rftprasftnts th«> last hit of thft hns to he shifted nut 



Captured Signal Name 



Description 



WLQDB-QU3%SQ_VAIjOJLAST_H<> 

M^QUK_QU3%Sq_VALl_LAST_H<> 

MJ*UK_Qm%KMJVAI^JLAST JEtO 

M^QUK_QU8*PAQ_STATUS_PS_H<2.0> 

MLQUK_QU8%MMK_TKAP_P3_H<> 

B^Q«B_QUB%RTr_VAIJLAST - P2_H<> 

M_S6C_TST%CBOX_REF_n*P?_H<> 

M*CBOX^B YPASS_KNABLK_H< > 

M w S«C_TST<««M w LAT_FULL l .P?_H<> 

»LS6C_TST%VAP_VAI^_LAST_H<> 

M_QUE_QTO%IRKF_VA1 L .LAST_P<> 

MLQUE_QU1<*MME_VAL ^LAST_H<> 

M_QUK_8«L%S6 JPAJL3_H<9:31> 

M w QXnE_S«L%S6JRAJUtJB<0:8> 

B^QUE%S6_AT_H<1:0> 

M_QUK%S6_TAG_H<4 :0> 

WLQUK%S5_DKST_H< 1 :0> 

M W QOE%S5_CMD^H<4:0> 

M_QUE%S5_DI^H< 1:0> 

MLQUE%SS_QUALJS<6:0> 



cycle-delayed valid bit for Oth entry Spec Queue 
cycle-delayed valid bit for 1st entry Spec Queue 
cycle-delayed valid bit for EM_LATCH 
Status bits for PA_QUEUE 
Memory Management Exception Trap signal 
cycle-delayed valid bit for RTY_DMISS_LATCH 
Indicates S6 read reference is for Cbox 
Enables bypassing of Cbox cache fill data 
Indicates EM_LATCH backpressure status to Ebox 
cycle-delayed valid bit for VAPJLATCH 
cycle-delayed valid bit for IREF_LATCH 
cycle-delayed valid bit for MME_LATCH 
samples S5_PA Bus 
samples S5_PA Bus 
Access type for S5 reference 
Ebox tag address for S5 reference 
Box destination code for S5 reference 
Command for S5 reference 
Data length for S5 reference 
Qualifier bits for S5 reference 



Note that only »1.QUE%S5_PAJB<31:0> contains a data reducer. Implementing a data reducer on this 
bus should provide coverage for the Mbox S5 pipe as well as coverage for the Ibox, Ebox and Cbox 
logic which issue references to the Mbox. 
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12.9.2 Nodes on Parallel Port 

The following signals are observable via the Parallel Port: 
— M_QUE%S5_CMD_H<4:0> 



Current Reference Source (3 encoded bits). The encodings are as follows: 


Tf a > f i ii'im i * i% 


Encoding 


NOP or PA_QUEUE (when and = STORE) 


000 


IREF.LATCH 


001 


SPEC.QUEUE 


010 


EM.LATCH (when cmd A = STORE) 


011 


VAPJLATCH (when and A = STORE) 


100 


MME_LATCH 


101 


RTY_DMISS_LATCH 


110 


CBOX_LATCH 


111 


M_QUE_QU5%ABORT_P4_H 




M_MME JVIMD%TB_MISS_L3_H 




M_PC_BSL%PCACHE_HIT_P4_H 




MME state machine state bits (4 encoded bits). The encodings are as follows: 


State Name 


Encoding 


home 


0000 


tb_miss_l 


0001 


tb_miss_2 


0010 


tb_miss_3 


0011 


tb_miss_4 


0100 


tb_miss_5 


0101 


doub_tb_miss_l 


0110 


doub_tb_miss_2 


0111 


doub_tb_miss_3 


1000 


doub_tb_miss_4 


1001 


mme_l 


1010 


mme_2 


1011 


ipr_rd_l_tb_per_2 


1100 


xpage_l 


1101 


tb_per_l 


1110 


undefined 


1111 



— MD_BUS Qualifiers (3 encoded bits). The encodings are as follows: 
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Event Encoding 

undefined 000 

Ibox data 001 

Ebox data 010 

Ibox and Ebox data 011 

VIC data 100 

Ibox IPR data 101 

undefined 110 

Mbox data 111 



— M%MME_FAULT_H 

12.9.3 Nodes on Top Metal 

tbd 

12.9.4 Architectural features 

The following is a brief description of all the Mbox architectural features which are relevant to 
verification, debug, and chip test. All of these features are invoked through the use of IPRs which 
are defined at the NVAX instruction set level. All of these IPRs can be invoked through the use 
of MTPR or MFPR macroinstructions. See the Architectural Summary Chapter for a list of all 
Mbox IPR addresses. Note that Mbox IPR addresses referenced through the MxPR instruction 
are translated by the Ebox microcode into IPR.RD, IPR_WR, TBIS, TBIA, or PROBE operations 
before being issued to the Mbox. 

12.9.4.1 Translation Buffer Testability 

The diagnostic user can invalidate the entire TB array by executing an MTPR instruction which 
addresses the TBIA IPR. This operation will also reset the NLU pointer. The user can invalidate 
any virtual page address which may cached in the TB by executing a MTPR addressing the TBIS 
IPR. 

The diagnostic user can explicitly query the TB to determine if a given tag is validated and 
stored in the TB. This is accomplished by addressing the Translation Buffer Check IPR through 
the MTPR instruction. 

Every TB entry can be explicitly filled and validated by the diagnostic user through the use of the 
TB_TAG_FILL and TBJPTEJFILL commands. The entry on which these two commands operate 
at any given time is addressed by the NLU pointer. The NLU pointer is a round robin pointer 
which increments when a TB_PTE_FILL is executed or when a tag match is detected on the entry 
which the NLU pointer is currently pointing to. The NLU pointer is reset to point to the Oth 
| entry whenever a TBIA command is executed. 
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It is the responsibility of the diagnostic user to set his/her tests up such that normal I-stream and 
D-stream references generated in the macropipeline do not interfere with the TB state under test. 
Specifically, the user must guarantee that all relevant pages of the diagnostic program reside in 
the TB before the test begins, such that accessing these pages will not cause modification of the 
TB state while the diagnostic program is explicitly probing and changing TB state. 

See Section 12.5.1.3 for a complete description of TB function as it relates to testability. See 
Section 12.3.11.2 for a description of the PROBE command which can be invoked through the 
Translation Buffer Check IPR. 

12.9.4.2 Pcache Testability 

Every bit in the Pcache can be read and written by the user through DREAD, WRITE, IPR_RD 
and IPR_WR operations. Pcache is accessed by DREADs and WRITEs. All other bits (tag, valid 
bits and parity bits) are accessed through Mbox IPRs. 

The operational mode of the Pcache can be changed to accomodate testing the array. The mode 
is controlled by the Pcache Control Register (PCCTL) which can be read and written as an Mbox: 
IPR. The PCCTL allows the user to: 

1. Enable/disable D-stream and/or I-stream operations to the Pcache. 

2. Allow the Pcache to operate in a direct mapped force hit mode. 

3. Enable/disable Pcache parity checks. 

See Section 12.4 for a complete description of Pcache function as it relates to testability. 

12.9.5 M-BOX Miscellaneous Features 

— tbd 
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12.10 Mbox Performance Monitor Hardware 

Hardware exists in the Mbox: to support the NVAX Performance Monitoring Facility See 
Chapter 18 for a global description of this facility. 

The Mbox hardware generates two signals, M%PMUXO_H and M%PMUX1_H, which are driven to the 
central performance monitoring hardware residing in the Ebox. These two signals are used to 
supply Mbox performance data for the purpose of recording performance statistics. Seven Mbox 
performance monitoring functions exist. The function to be executed is specified by the PMM 
field of the PCCTL register (see Figure 12-31). 

The following describes the seven Mbox performance monitor modes: 



Table 12-27: Mbox Performance Monitor Modes 



PCCTL<7^> 


Performance Monitor Mode 


000 


TB hit rate for SO Space I-stream Reads 1 


001 


TB hit rate for SO Space D-stream Reads 1 


010 


TB hit rate for P0/P1 Space I-stream Reads 1 


011 


TB hit rate for P0/P1 Space D-stream Reads 1 


100 


Pcache hit rate for I-stream Reads 


101 


Pcache hit rate for D-stream Reads 


110 


illegal mode-Results are UNPREDICTABLE 


111 


ratio of unaligned virtual reads and virtual writes to total virtual reads 




and virtual writes 



1 TB hit count is unconditionally incremented when MAPEN=0 



12.10.1 TB hit rate Performance Monitor Modes 

The TB hit rate modes work by asserting M%PMUXO_H during the cycle in which a specific type 
of virtual read reference is first attempted in the S5 execution pipe. During the same cycle, 
M%PMUX1_H will transfer the TB hit status corresponding to this read execution event. 

It is important to capture this data only on the first execution of the read in order that the TB 
hit statistics are not skewed by multiple retries of the same reference due to aborted cycles and 
tb_miss sequences. 

One low probability scenario exists in which this scheme will not accurately record the TB hit/miss 
data for the reference. Consider the case where the read is initially executed and is found to hit 
in the TB while simultaneously being aborted due to some abort condition (e.g. Pcache Index 
Conflict). During the following cycle, another reference is executed which invokes a TB miss 
sequence. If the TB miss sequence displaces the PEE corresponding to the first read, then the 
read will subsequently be retried as a TB miss event even though it has already been recorded 
as a TB hit event. However, the frequency of this scenario should normally be so low that the 
accuracy of the TB hit ratio statistics will not be affected. 
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12.10.1.1 TB hit rate for P0/P1 l-stream Reads 

In this mode, M%PMUXO_H is asserted during the cycle in which the IREFJLATCH first attempts 
to drive a virtual process space IREAD into the S5 pipe. Note that M%PMUX0_H is only asserted 
in response to IREAD execution events caused by Ibox-generated IREADs. This avoids recording 
Mbox-generated "fill forward" IREADs which would abnormally boost the TB hit rate. During 
the same cycle, M%PMUXl_H will transfer the TB hit status corresponding to the same IREAD 
execution event. 

12.10.1.2 TB hit rate for P0/P1 D-stream Reads 

In this mode, M%FMUXO_H is asserted during the cycle in which the SPEC_QUEUE, EMJLATCH, 
YAPJLATCH or MMEJLATCH first attempts to drive a virtual process space read into the S5 
pipe. During the same cycle, M%PMUXl_H will transfer the TB hit status corresponding to the 
same read execution event. 

12.10.1.3 TB hit rate for SO l-stream Reads 

In this mode, M%PMUXO_H is asserted during the cycle in which the IREF_LATCH first attempts 
to drive a system space IREAD into the S5 pipe. Note that M%PMUXO_H is only asserted in 
response to IREAD execution events caused by Ibox-generated IREADs. This avoids recording 
Mbox-generated "fill forward" IREADs which would abnormally boost the TB hit rate. During 
the same cycle, M%PMUX1_H will transfer the TB hit status corresponding to the same IREAD 
execution event. 

12.10.1.4 TB hit rate for SO D-stream Reads 

In this mode, M%PMUX0_H is asserted during the cycle in which the SPEC_QUEUE, EMJLATCH, 
VAP_LATCH or MME_LATCH first attempts to drive a virtual system space read into the S5 
pipe. During the same cycle, M%PMUXl_H will transfer the TB hit status corresponding to the 
same read execution event. 

12.10.2 Pcache hit rate Performance Monitor Modes 

The Pcache hit rate modes work by asserting M%PMUXOJB during the cycle in which a specific 
type of S6 physical read reference is executed in the Pcache. During the same cycle, M%PMUX1_H 
will transfer the Pcache hit status corresponding to this read execution event. 

12.10.2.1 Pcache hit rate for l-stream Reads 

In this mode, M%PMUXO_H is asserted during the cycle in which an IREAD is executing in 
the S6 pipe. M%PMUXO_H is only asserted in response to IREAD execution events caused by 
Ibox-generated IREADs. This avoids recording Mbox-generated "fill forward" IREADs which 
would abnormally boost the Pcache hit rate. M%PMUXl_H will transfer the Pcache hit status 
corresponding to the same IREAD execution event during the cycle which M%PMUXO_H is asserted. 
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12.10.2.2 Pcache hit rate for D-stream Reads 

In this mode, M%PMUX0_H is asserted during the cycle in which a D-stream read is executing 
in the S6 pipe. M%PMUX0_H is only asserted in response to the first Pcache lookup attempt of 
a D-stream read executing in the S6 pipe. This avoids skewing the performance data based on 
the same reference being retried in the Pcache due to the "read under fill" function. Therefore, 
S6 reads originating from the RTY_DMISS_LATCH do not cause the assertion of M%PMUXO_H. 
M%PMUXl_H will transfer the Pcache hit status corresponding to the same read execution event 
during the cycle which M%PMUXO_H is asserted. 

1 2.1 0.3 Unalig ned reference statistics 

This mode allows the user to obtain the percentage of references processed by the Mbox which 
are unaligned. 

In this mode, M%FMUX0_H is asserted on any virtual read, virtual DEST_ADDR, or virtual 
WRITE reference driven from the SPEC_QUEUE or EM_LATCH. The reference must virtual 
to be recorded due to the nature of the hardware implementation. M%PMUXl_H is asserted on the 
same conditions as M%PMUXO_H, except that it is further qualified by the fact that the reference 
is unaligned. 
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12.11 Mbox Signal Name Cross-Reference 

All signal names referenced in this chapter have appeared in bold and reflect the actual name 
appearing in the NVAX schematic set. For each signal appearing in this chapter, the table below 
lists the corresponding name which exists in the behavioral model. 



Table 12-28: Cross-reference of all names appearing In the Mbox chapter 



Schematic Name 



Behavioral Model Name 



B%se_DAXA w H<63 :0> 
C%CBOX_CMD w H< 1 :0> 
C%CSOX u ADDR_H<3 1 :5 > 

c%mbox_fiix_qw_h<4:3> 

c%bko_dqw_h<> 

c%S6_dp_h<7:0> 

C*LAST_raX_H 

CftCSOXJOABDJEBB^H 

C«CBOXJECC_KRR k .H 

c%wr^buf_bacilprks_h 
e%kbox_cmd_h<4:0> 
b%va w bcs_k31:0> 
e%wbus_h<3 1:0> 
e%eboxjiag_h<4:0> 
e%ebox_at_h< 1 :0> 
e%kboxj>l_h< 1 :0> 
e%ebox_virt _/j>t>rj& 
e%mmgtjmdde_h< 1 k)> 

E%CDlUMODEja<lK)> 

K%KRBFJtEQ_H 

E%EMJU80RT_L 

E%FLDSHJPA_QUKUE_H 
E%START_IBOX_IO JRDJB 
E%RKSTART_SFKC_QtnEXJE_H 
E%NO_MME_CHECK_H 
I%IBOX_CMD_L<4, 1 :0> 
I%EBOX _>U>DR._H<3 1 :0> 
I*>EBOX_TAG _JL<2 :0> 
I%EBOX_AT_L< 1 : 0 > 



B%S6_DATA_H<63:0> 

C%CBOX_CMD_H<1:0> 

C%CBOX_ADDR_H<31:5> 

C%MBOX_FILL_QW_H<4:3> 

C%REQ_DQW_H 

C%S6_DP_H<7:0> 

C%IAST_FILL_H 

C%CBOX_HARD_ERR_H 

C%CBOX_ECC_ERR_H 

C%WR_BUF_BACK_PRES_H 

E%EBOX_CMD_H<4.-0> 

E%VA_BUS_H<31:0> 

E%WBUS_H<31:0> 

E%EBOX_TAG_H<4:0> 

E%EBOX_AT_H<lK)> 

E%EBOX_DL_H<1:0> 

E%EBOX_VTRT_ADDR_H 

E%MMGT_MODE_H<1:0> 

E%CUR_MODE_H<1:0> 

E%EREF_REQ_H 

E%EM_ABORT_H 

E%FLUSH_MBOX_H 

E%FLUSH_PA_QUEUE_H 

E%START_IBOX_IO_RD_H 

E%RESTART_SPEC_QUEUE_H 

E%NO_MME_CHECK_H 

I%IBOX_CMD_H<4:0> 

I%IBOX_ADDR_H<31:0> 

I%IBOXL.TAG_H<2:0> 

I%IBOX_AT _H<1:0> 
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Table 12-28 (Cont): Cross-reference of all names appearing In the Mbox chapter 



Schematic Name 



Behavioral Model Name 



i%iboxjoi^l< 1:0> 

I%BB05LHKF_DKST_L< 1 :0> 

I%SPKC_RJKQ_H 

I%FOBCE_|«MKJPAULT_H 

I%FOHCB_HABJD_FAULT w H 

I%FLUHH W JSBF_LAT.H 

M%ABORT_CBOXJRD_B: 

M%C_S6_PA_H<2 :0> 

M%C BOJLBYPASSJSNABLE J3 

M%CBOX_LATE JKN_H 

M"*CBOX w REFJENABLE_L 

M%KBQX_DAXAja 

MXEMJJ&JVLLJl 
M%HAHD_KHB^H 
M%IBOXJDATA w L 
M%IBOXJFR_WRJSI 

M9MtfB03LS_KRRORja 
M<**H)_BtJ8_H<63 > 
M«MD_BUB_QW_PABTIT_L 
M%MDJIAG_H<4 :0> 
M%MMK_FAULT_H 
M%PA_Q_STATUSja<2:0> 
M%PMUXD_H 
M%PMUX1_H 

M%QW_ALIGNMKNT_H< 1 :0 > 
M*SPEC_Q_rOLLuB: 
M%S6_BYTE J MAS^_H< 7 K)> 
M%S6_CMD_H<4 :0> 

M*s«_PAja<31:0> 

M%VICJ>A'13* W L 
WLQUB%S«_AT_H<1:0> 
MjffVB%8S_CMDJL<4'.0> 
M^QUE*S6_DATA W H<31:0> 
KLQUE%S6_DKST_H< 1 :0> 



I%IBOX_DL_H<1:0> 

I%IBOX_REF_DEST_H<1:0> 

I%IREF_REQ_H 

I%SPEC_REQ_H 

I%FORCE_MME_FAULT_H 

I%FORCE_HARD_FAULT_H 

I%FLUSH_IREF_LAT_H 

M%ABORT_CBOX_IRD_H 

M%C_S6_PA_H<2:0> 

M%CBOX_BYPASS_ENABLE_H 

M%CBOX_LATE_EN_H 

M%CBOX_REF_ENABLE_H 

M%EBOX_DATA_H 

M%EM_LAT_FULL_H 

M9oHARD_ERR_H 

M%IBOX_DATA_H 

M%IBOX_IPR_WR_H 

M%LAST_FILL_H 

M%MBOX_S_ERROR_H 

M%MDJBUS_H<63:0> 

M%MD_BUS_QW_PARITY_H 

M%MD_TAG_H<4.0> 

M%MME_FAULT_H 

M%PA_Q_STATUS_H<2:0> 

M%PMUXO_H 

M%PMUX1_H 

M%QW_ALIGNMENT_H<1:0> 

M%SPEC_Q_FULL_H 

M%S6_BYTE_MASK_H<7:0> 

M%S6_CMD_H<4:0> 

M%S6_PA_H<31:0> 

M%VIC_DATA_H 

M_QUE%S5_AT_H<1:0> 

M_QUE%S5_CMD_H<4:0> 

M_QUE%S5JDATA_H<31:0> 

M_QUE%S5_DEST_H<1:0> 
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Table 12-28 (Cont.): Cross-reference of all names appearing In the Mbox chapter 

Schematic Name Behavioral Model Name 



M W QUE%S5_DI < JH< 1:0> 

M_QUE%S6_PAja<3 1 :0> 

WLQUE%S6_QUAL_H<6 : 0 > 
M W QUE%S5_TAG_H<4 : 0> 
M_QUE%S6_VAJH<3 1 :0> 
Bt_S6C>BT%ABORT_L 



M_QUE%S5_DL_H<1:0> 

M_QUE%S5_PADP_H<31:0> 

M_QUE%S5_QUAL_H<6K)> 

M_QUE%S5_TAG_H<4:0> 

M_QUE%S5_VA_H<31:0> 

M_S5C%AB0RT_H 
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Add perf monitor hardware. Other tweaks 


Bill Wheeler 
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Chapter 13 
The Cbox 



13.1 Terminology 



Term 



Meaning 



Error transition mode (ETM) 



Cache coherence transaction 



Deallocate 



Longword 
Quadword 
Hexaword 



Mode where the backup cache only services CPU requests to blocks which 
are valid-owned. All other CPU requests, including those to valid-unowned 
blocks, are ignored by the backup cache and are forwarded to memory. 
The purpose is to use the cache as little as possible because of previously 
detected errors. 

A transaction from the external system which interrogates the backup cache 
and may cause a block invalidate and/or a block writeback. 

The actions necessary to allocate a new block because of a read miss or 
a write miss. A writeback is required if the block is valid-owned. An 
invalidate is required if the block is valid, whether owned or unowned. A 
cache coherency request which results in a hit also causes a deallocate. 

4 bytes of data 

8 bytes of data 

32 bytes of data 



13.2 Functional Overview of the Cbox and Backup Cache 

The Cbox is that section of the NVAX CPU chip which controls the backup cache and interfaces 
to the external bus. The Cbox includes the BIU functions for the NVAX CPU. The backup cache 
is a writeback cache. Cache tags and cache data are stored in off-chip static RAMs (off-the-shelf 
parts). The Cbox implements the control for the cache tags; control for the cache data; and control 
for the external pin bus, the NDAL. 

The Mbox sends read requests and writes to the Cbox; the Cbox sends fills and invalidates to the 
Mbox. The Cbox ensures that the Pcache is a subset of the backup cache through invalidates. 

The Cbox communicates with the memory subsystem (everything beyond the backup cache) via 
the NDAL. The Cbox generates reads and receives fills; it receives cache coherence transactions 
from the NDAL to which it responds with invalidates and writebacks, as appropriate. 

The reader is assumed to be familiar with Chapter 3, which describes the NDAL. 
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Cache coherence in an NVAX system is based upon the concept of ownership. A hexaword block 
of memory may be owned either by memory or by an NVAX backup cache. In a multiprocessor 
system, only one of the caches or memory can own the block at a time. Several of the planned 
NVAX systems implement an explicit ownership bit for each hexaword block of memory; it would 
also be possible to build an NVAX system without explicit ownership bits in memory. 

13.2.1 The Cbox and the System 

The Cbox has a tightly coupled internal interface with the Mbox. It has separate external busses 
which communicate with the backup cache tag RAMs, the backup cache data RAMs, and the 
memory interface, as shown in Figure 13—1. 

Figure 13-1 : The Cbox in the System 



MBOX 



PCACHE 



ON-CHIP 
OFF-CHIP 



CBOX/BIU 



~7R 



80 



41 



NDAL 



92 



TAG RAMS 



BACKUP CACHE 



DATA RAMS 



MEMORY INTERFACE 
7T\ 



SYSTEM MEMORY AND I/O BUS 
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13.2.2 Writeback Cache and Ownership Concepts 

There is one fundamental difference between a writeback cache and a writethrough cache. When 
a write is received by a write-through cache, the data may be written into the cache and is 
always written to memory as well. When a write is received by a writeback cache, the write is 
not necessarily forwarded to memory; the write may be done only into the cache. The data is 
written back to memory only if another element in the system needs that data, or if the block is 
displaced (deallocated) from the cache. 

The NVAX backup cache is a writeback design in which a cache block may exist in one of three 
states: invalid, valid-unowned, and valid-owned. A block which is valid-unowned is a read-only 
copy of memory data. A block which is valid-owned may be written by NVAX, and if it has been 
written since being put into the cache, is the only up-to-date copy of the data in the system. The 
NVAX cache makes no distinction between valid-owned blocks it has written and those which it 
has not written. 

A valid-unowned copy of a given cache block may reside in one or more backup caches in an 
NVAX multiprocessor system. No NVAX backup cache may contain a valid cache block which is 
valid-owned by another backup cache in the system. The Cbox design relies upon the system bus 
and/or the system bus interface to support XDAL Ownership Read/Disown Write pairs to ensure 
cache coherency. 

The most straightforward way to implement a memory for NVAX is to have an ownership bit 
associated with each hexaword of data. When this memory receives an Ownership Read (OREAD) 
for a hexaword, ownership is passed to the requesting CPU, and the data is returned to the CPU. 
If another Ownership Read arrives for that hexaword from a second CPU, memory does not 
return the data since the hexaword is not owned by memory but by the first CPU. The first CPU 
recognizes the second OREAD as a cache coherence transaction and writes back the data from 
its cache, using the Disown Write command. The data is then available for the second CPU. 

During normal operation, the Cbox issues an OREAD to the memory interface and receives 
ownership of the block before it performs a write to that block in the backup cache. The Cbox 
relinquishes ownership of the data when a cache coherence transaction requesting a writeback 
appears on the NDAL. 

13.2.3 Backup Cache Operating Modes 

The backup cache has four distinct modes of operation. 

• Cache ON. Normal operation. Most of this chapter describes Cbox operation when the backup 
cache is on 

• Cache OFF. Reset puts the backup cache into the OFF state. The backup cache may be 
enabled/disabled (turned ON/OFF) by software through the Cbox control IPR. Cache off mode 
is described in Section 13.9.1. 

• Force Hit. The Cbox forces all memory space reads and writes to hit in the backup cache. 
This mode is used for testing and initialization purposes. Force Hit mode is described in 
Section 13.9.2. 

• Error Transition Mode. The Cbox enters Error Transition Mode upon recognition of some 
error conditions or when put into ETM explicitly by an IPR write. Error Transition Mode is 
described in Section 13.9.3. 
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13.3 NVAX Backup Cache Organization and interface 

The backup cache is configurable based on the size and speed of the cache RAMs used to imple- 
ment the cache on the board. 

The backup cache may be configured to be one of four sizes: 128 kilobytes, 256 kilobytes, 512 
kilobytes, or 2 megabytes. This is controlled by the SIZE field in the CCTL register, as described 
in Section 13.5.1. The smallest EAMs which may be used to achieve each configuration are shown 
in Table 13-1. 



Table 13-1: 


Backup Cache Size and RAMs Used 






Cache size 


Tag RAM Size 


Data RAM Size 


Number of Tags 


Valid Bits Per Tag 


128 Kilobytes 


4Kx4 


16Kx4 


4K 


1 


256 Kilobytes 


SKxS 1 


32K x 8 1 


8K 


1 


512 Kilobytes 


16Kx4 


64Kx4 


16K 


1 


2 Megabytes 


64Kx4 


256Kx4 


64K 


1 



1 Using x8 parts means the cache no longer takes advantage of the nibble protection feature of the cache ECC design. 



Regardless of configuration, the cache has a block size of 32 bytes and has no subblocks. The 
data bus to the cache is 8 bytes wide, so in order to read out an entire block, 4 accesses are done. 
Each block contains 32 bytes of data and has associated with it a tag, a valid bit, and an owned 
bit. ECC protection is provided on each quadword in the cache. ECC protection is also provided 
on the tag store. 

Each of address bits <20:17> serves either as an index bit or as a tag bit, based on the cache size 
configured. Table 13—2 shows how the bits are used. 



Table 13-2: 


Tag and Index Interpretation based on cache size 


Cache size 


Tag bits used 


Index bits used 


128 kilobytes 


Tag<31:17> 


Index<16:5> 


256 kilobytes 


Tag<31:18> 


Index<17:5> 


512 kilobytes 


Tag<31:19> 


Index<18:5> 


2 megabytes 


Tag<31:21> 


Index<20:5> 



The backup cache speed may also be configured based on the access time of the RAMs used to 
implement the tag store and the data store. The TAGJ3PEED and DATAJ3PEED fields of the 
Cbox control register, CCTL, are used to control the number of NVAX cycles used by the Cbox to 
access the RAMs. The relationship between TAGJ3PEED, DATAJ3PEED, NVAX cycle time, and 
| the cache RAM access times required is shown in Table 13-3. 
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NOTE 

Table 13—3 is based upon simulations of tbe XNP (XMI-based system) board. These 
numbers may only be applied directly to an environment which is very close to that of 
the XNP. 



Table 13-3: Backup Cache RAM Speeds and NVAX Cycle Time 





Tag RAM tag read 1 




Data RAM 






CCTL 


access (access) tag write 1 


CCTL 


access 


data read 


data we 




tune rep rate rep rate 1 


DATA^SPEED time 


rep rate 


rep rate 


HAM speeds 


required for 16 ns NVAX cycle time 










0 


0 - 21ns (2) 3 cycles 3 cycles 1 


OO 1 


0 - 19.5ns 


2 cycles 


3 cycles 


l 1 


22 - 37ns (3)4 cycles 4 cycles 1 


01 


20 - 35.5ns 


3 cycles 


4 cycles 






10 


36 -51.5ns 


4 cycles 


5 cycles 


KAJVL speeds 


required for 14 ns NVAX cycle tune 










0 


0 - 17.5ns (2)3 cycles 3 cycles I 


OO 1 


0 - 16 ns 


2 cycles 


3 cycles 


li 


16 -31.5ns (3)4 cycles 4 cycles 1 


01 


17 - 30 ns 


3 cycles 


4 cycles 






10 


31 -44 ns 


4 cycles 


5 cycles 


RAM Speeds 


required for 12 ns NVAX cycle time 










0 


0 - 14ns (2) 3 cycles 3 cycles 1 


OO 1 


0 - 13 ns 


2 cycles 


3 cycles 


l 1 


15 - 26ns (3)4 cycles 4 cycles 1 


01 


14 - 25 ns 


3 cycles 


4 cycles 






10 


26 - 37 ns 


4 cycles 


5 cycles 


RAM Speeds 


required for 10 ns NVAX cycle time 










0 


0- 10.5ns (2) 3 cycles 3 cycles 1 


OO 1 


0 - 9.5 ns 


2 cycles 


3 cycles 


l 1 


11 -20.5ns (3)4 cycles 4 cycles 1 


01 


10 -19.5 ns 


3 cycles 


4 cycles 






10 


20 -29.5 ns 


4 cycles 


5 cycles 



1 TAG_SPEED=1 cannot be used with DATA_SPEED=00, as the NVAX Cbox cannot function with tag rams whose read 
access time is longer than the data ram read access time. 



Extensive simulations of the NVAX chip, package, and XNP board were done in order to determine 
the drive times of the cache pins in this environment. The drive times are measured from the internal 
NVAX clock to the signal being valid at the cache pin. The drive times for TT (typical speed) parts 
under worst-case conditions are shown in Table 13-4. These drive times would be met under worst- 
case conditions in the 14ns system. These drive times only apply to the XNP board, and cache drive 
times and performance would be different in a different environment. 
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Table 13-4: Cache pin drive times in the XNP environment 


NVAX Cache Interface Pin 


Starting clock 


lime to signal valid 
at cache RAM 


P%TS_TAG H<81:17>,P%TSJBCC H<5K», 
P%TS_OWNED_H, P%TS_VALID_H 


BLPAD%PHI_4_H 


8.5 ns 


P%TS_TAG_H<31:17>, P%TS_ECCJH<5:0>, 

TVX.T'C OVXTXTTTTn XX "DCZTVG VAT TT» XT 
ir*7c 1 D_vJ VV IN ri 1 >_£L, rye 1 B_ V/vL«IJLJ_Jt± 


K^PAD%PHI_4_H 


1.5 ns (tri state time) 


P%TS_INDEX_H<1 0:0> 


E_PAI>%PHI_3_H 


8.0 ns 


P%TS_INDEX_H<20:U > 


E W PADL%PBI_S_H 


8.0 ns 


P%TS_OE_L 


k w padl%phi_i_h (assertion), 
k.padl%phi_4_h (deassertiozi) 


8.0 ns 


P%TS_WEJL 


ejpadl%prt_3_h (assertion), 
5JS»adl*phi_i_h (deassertion) 


8.0 ns 


P%DR_INDEX_H<20:3> 


K_PADL<«PHI_3_H 


8.0 ns 


P%DR_OE_L 


k.padl%phi_i_h (assertion), 
k_padl%phi_4_h (deassertion) 


8.0 ns 


P%DR_WE_L 


k w padl«phi_3_h (assertion), 
K w PADL%PHr_i_H (deassertion) 


8.0 ns 


P%DR_DATA_H<63:0>, P<5toDR_ECCJH<7H» 


K_PADLS»PHI_4_H 


8.5 ns 


P%DR_DATA_H<63:0>,P%DR i .ECC_H<7K)> 


K_PADL«SPHI_4_H 


1.5 ns (tristate time) 



Figure 13-2 and Figure 13—3 show the timing of cache tag transactions and of cache data transactions. 
The symbols shown in the timing diagrams are denned in Table 13—5. 



Table 13-5: Cache pin timing symbol definitions 



Symbol 


Meaning 


Taa 


RAM address access time: valid index to RAM output valid 


Ibe 


Assertion of output enable to RAM output valid 


Toh 


RAM output hold from address change 


Tohz 


Output disable to RAM output in high Z 


Taw 


VaHd index to end of RAM write 


Tdw 


Data vahd to end of RAM write 


Tnz 


NVAX tristate time 


Twt 


Write enable deassert to address change (write recovery) 


Tdh 


NVAX data hold time after write enable deassert 


Twp 


Write enable pulse width 


Tas 


RAM address setup time to write enable assertion 
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NVftX Backup Cache TAG RAM Pad Timing 



TAG RAM read followed by another read. 

PI | P2 I P3 j P« 



TAG CONTROLLER STATE I 
I 

P%TS INDEX H<20«5> 



P»TS TAG HOl 1 17> 
P%TSJECC H<5l0> 
PITS OWNED H I 
P%TS_VALIDTl I 

P»TS OB L 



PVTS KB !■ 



IDLE 

Dooo(x»o6boooooooooqc 

-8 . Ona > | 



P2 I P3 




TAG RAM quadword writ* followed by read 

PI | P2 | P3 | P* 



TAG CONTROLLER STATE I 
I 

P%TS INDEX H<20l5> 



j P2 | P3 | P« PI | P2 j P3 j 



P%TS_TAG_H<3 1 I 17> 
P%TS_ECC H<5i0> 
P»TS OWNE"D_H I 
P»TS~VALID H I 



P%TS OE h 



P*TS HE L 



DO0000O0OOOO0OO0OOO0C 
k*— ■ — a . One ■ >w 

r . it 



Index for write - 



j P2 j F3 | PI 



WRITE 
I 



IDLE 

ZDOOO0OOO<»0OOO0OOO0OC 



LOOKUP 

|_ 



UUUUUlimUlllt NVAX Jri.ii^vaiU write dat 



-8 . 5na— — H 



Tdw 



Tnz-s»i^r*- 
1.5ns 



ilUMUlUUUUUMlU 



4* — ■ — —8 . On» >f" 



-13.0im 

Twp 

"SLIP CYCLE WOULD BE 
INSERTED HERE 



. Twr- 

nmrmnnnOL 



T 



RAMS driving. 



\<= — s» «s —8 .Oris -H 



Tdh 
2 . 5tis 



D^-.a RAM read. Aborted doe to tag miaa. 

PI I P2 | P3 



NVAX Backup Cache DATA RAM Pad Timing 




D; r CONTROLLER STATE I 
I 

P» 3R INDEX H<20l3> 



P«3R DATA IKC3tO> 
P» ">R ECC H<7|0> 



PI 3R OE h 



IDLE 



PI 5R_OE_L 
ft )R HE t 



Vp :a RAM quadword writ* followed by read. 



PI P2 



D» r CONTROLLER STATE I 
I 

P» 1R INDEX H<20l3> 



P t 3R_DATA H< « 3 1 0> 
PI }R ECC H<7l0> 



P3 I P4 



PI I P2 I P3 P4 



woooooooooooooooooc 

« — 8 . Ona 



WRITE 



PI | P2 I P 3 j 



WRITE 

I 



P3 P4 



H 9.5na 

: I 



valid write dat 



\SSSS5SS5SSSSS 



IDLE 

ZD<XX»OOO0boO<»O<X!!!)OOC 



PI I P2 I P3 P4 



— -«:««£«£ 



-«««««»««««(((«« 

RAMS driving. T 



Twr 

iitmi/tiuiimuj t 

>4 < a.Onn >H H 

r • i i< : -h n 

ie inserted here | Tdh | 



— l3.0iifl 

Twp 

'SLIP CYCLE WOULD HE INSERTED HERE 



8 .Ona -H 
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13.3.1 Backup Cache Interface 

This section describes the NVAX pins dedicated to the backup cache interface. These are listed 
in Table 13-6. 



Table 13-6: NVAX Backup Cache Interface Pins 



Signal 


Number 


Input/output 


Type 


BACKUP CACHE TAG STORE SIGNALS (41 total) 


P%TS_INDEX_H<20:5> 


16 


Output 


One driver, six receivers 


P%TS_OE_L 


1 


Output 


One driver, six receivers 


P%TS_WE_L 


1 


Output 


One driver, six receivers 


P%TS_TAG_H<31:17> 


15 


Input/Output 


Tristate, seven drivers/receivers 


P%TS_ECC_H<5K» 


6 


Input/Output 


Tristate, seven drivers/receivers 


P%TS.OWNEI>_H 


1 


Input/Output 


Tristate, seven drivers/receivers 


P9cTS_VALID.H 


1 


Input/Output 


Tristate, seven drivers/receivers 


BACKUP CACHE DATA RAM SIGNALS (92 total) 


P%DR_INDEX_H<20:3> 


18 


Output 


One driver, eighteen receivers 


P%DR_OE.L 


1 


Output 


One driver, eighteen receivers 


P^DRWE.L 


1 


Output 


One driver, eighteen receivers 


P%DR_DATA_H<63:0> 


64 


Input/Output 


Tristate, nineteen drivers/receivers 


P%DRJECC_H<7:0> 


8 


Input/Output 


Tristate, nineteen drivers/receivers 



The pins listed are described in the sections which follow. 



13.3.1.1 P%TSJNDEX_H<20:5> 

These pins drive the address lines of the tag RAMs, thus indexing into one row of the tag store. 
The value driven depends upon the corresponding bits in the address of the memory or IPR 
reference being done. 

P%TS_INDEX_H<16:5> are used for every cache configuration. P%TS_ENDEX_H<20:17> are 
used based on the cache size selected. When the cache size selected is smaller than 2 megabytes, 
some or all of these four bits are driven to 0 rather than to the value given in the address. This 
is shown in Table 13-7. 
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Table 13-7: 


Usage of P%TS_INDEX_H<20:5> based on cache size 




P%TS_INDEX_H bits driven 




Cache size 


unconditionally to 0 


P%TS_ENDEX_H bits used 


128 kilobytes 


P%TS_INDEX_H<20:17> 


P%TS_INDEX_H<16:5> 


256 kilobytes 


P%TS_INDEX_H<20:18> 


P%TS_INDEX_H<17:5> 


512 kilobytes 


P%TSJNDEX_H<20:19> 


P%TSJENDEX_H<18:5> 


2 megabytes 


None 


P%TS_INDEX_H<20i5> 



P%TS__INDEX_H<20:5> are driven by NVAX and received by up to 6 RAM chips. 



13.3.1.2 P%TS_OEJ- 

P%TS_OE_L (Tag Store Output Enable) is an output pin which controls the tag store RAMs. 
It enables the RAMs to drive their outputs. It is asserted (driven low) when the tag store 
is being read, and allows the tag store to drive P%TS_TAG_H<31:17>, P%TSJECC_H<5:0>, 
P%TS_OWNED_H and P%TS_VALTD_H . When the tag store is being written, P%TS_OEJL is 
deasserted (driven high). 

P%TS_OE_L is driven by NVAX and received by up to 6 RAM chips. 

13.3.1.3 P%TS_WE_L 

P%TS_WEJL (Tag Store Write Enable) is an output pin which, when asserted, enables the tag 
store RAMs to be written. It is asserted (driven low) during writes of the tag store. 

P%TSJWE_L is driven by NVAX and received by up to 6 RAM chips. 

13.3.1 .4 P%TS_TAG_H<31 :1 7> 

P%TS_TAG_H<31:17> are I/O pins which are used to transfer the cache tag to and from the tag 
store RAMs. When the tag store is being written, P%TSJEAGJB<31:17> are used as outputs; 
when the tag store is being read, P%TS_TAG_H<31:17> are used as inputs. 

Some of the tag lines are not used when the cache is bigger than 128 kilobytes, as shown in 
Table 13—8. When this is the case, the board designer does not need to connect the pin at all on 
the board. The pin is pulled low through a resistor in the pad so that internal to the Cbox, the 
unused tag lines are recognized as zeros when the tag is read. 



Table 13-8: 


Usage of P%TS_TAG_H<20:17> based on cache size 


Cache size 


Unused P%TS_TAG_H pins 


128 kilobytes 


None 


256 kilobytes 


P%TS_TAG_H<17> 


512 kilobytes 


P%TS_TAG_H<18:17> 


2 megabytes 


P%TS_TAG_H<20J7> 
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All of the P%TS_TAG_H<3 1 : 1 7> pads are built with internal resistors, for chip layout consis- 
tency. 

Each P%TS_TAG_H pin is connected to one RAM I/O pin. A system designer who intends to 
run NVAX only in 30-bit mode can leave P%TS_TAG_H<31:29> unconnected, and they will be 
pulled low internally so that the Cbox sees a zero value. 

1 3.3.1 .5 P%TS__ECC_H<5:0> 

P%TS_ECC_H<5:0> are I/O pins which are used to transfer the ECC check bits to and from the 
tag store EAMs. When the tag store is being written, P%TSJECCJH<5:0> are used as outputs; 
when the tag store is being read, P%TSJECC_H<5:0> are used as inputs. 

Each P%TS_ECC_H pin is connected to one RAM I/O pin. 

1 3.3.1 .6 P%TS_OWNED_H 

P%TS_OWNEDJE is an I/O pin which is used to transfer the ownership bit to and from the tag 
store RAMs. When the tag store is being written, P%TS_OWNEDJB is used as an output; when 
the tag store is being read, P%TS_OWNED_H is used as an input. 

P%TS_OWNED_H is connected to one RAM I/O pin. 

1 3.3.1 .7 P%TS_VALID_H 

P%TS_VAT JT)_H is an I/O pin which is used to transfer the valid bit to and from the tag store 
RAMs. When the tag store is being written, P%TS_VALED_H is used as an output; when the 
tag store is being read, P%TS_VALTD_H is used as an input. 

P%TS_VALID_H is connected to one RAM I/O pin. 

1 3.3.1 .8 P%DR_INDEX_H<20:3> 

These pins drive the address lines of the data RAMs, thus indexing into one row of the data store. 
The value driven depends upon the corresponding bits in the address of the memory reference 
being done. 

P%DR_INDEXja<16:3> are used for every cache configuration. P%DRJNDEX_H<20:17> are 
used based on the cache size selected. When the cache size selected is smaller than 2 megabytes, 
some or all of these four bits are driven to 0 rather than to the value given in the address. This 
is shown in Table 13-9. 

Table 13-9: Usage of P%DR_INDEX_H<20:5> based on cache size 



Cache size 



P9a>R_INDEX_H bits driven 
unconditionally to 0 



P%DR_INDEX_H bits used 



128 kilobytes 
256 kilobytes 
512 kilobytes 



P%DR_INDEX_H<20:17> 
P%DR_INDEX_H<20:18> 
P%DR_INDEXH<20:19> 



P%DR_INDEX_H<16:5> 
P%DR_INDEX_H<1 7:5> 
P%DR_INDEXH<18:5> 
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Table 13-9 (Cont.): Usage of P%DR_INDEX_H<20:5> based on cache size 





P%DRINDEX_H bits driven 




Cache size 


unconditionally to 0 


P%DR_INDEXJB bits used 


2 megabytes 


None 


P%DR_INDEX_H<20:5> 



P%DR_INDEX_H<16:5> are driven by NVAX and received by 18 RAM chips. 



13.3.1.9 P%DR_OE_L 

P%DR_OE_L (Data RAM Output Enable) is an output pin which controls the data RAMs. It 
enables the RAMs to drive their outputs. It is asserted (driven low) when the data RAMs are being 
read, and allows the data RAMs to drive P%DRJDATA_H<63:0> and P%DRJECC_H<7:0>. 
When the data RAMs are being written, P%DR_OE_L is deasserted (driven high). 

P%DR_OE_L is driven by NVAX and received by 18 RAM chips. 

13.3.1.10 P%DR_WE_L 

P%DR_WEJL (Data RAM Write Enable) is an output pin which, when asserted, enables the data 
RAMs to be written. It is asserted (driven low) during writes of the data RAMs. 

P%DR_WE_L is driven by NVAX and received by 18 RAM chips. 

13.3.1.11 P%DR_DATA_H<63:0> 

P%DRJDATA_H<63:0> are I/O pins which are used to transfer the cache data to and from 
the data RAMs. When the data RAMs are being written, P%DR_DATA_H<63K)> are used as 
outputs; when the data RAMs are being read, P%DR_DA!IA_H<63:0> are used as inputs. 

Each one of P%DR_DATA_H<63:0> is connected to one RAM I/O pin. 

1 3.3.1 .1 2 P%DR_ECC_H<7:0> 

P%DRJBCC_H<7:0> are I/O pins which are used to transfer the data EOC to and from the data 
store. When the data store is being written, P%DR_ECC_H<7K)> are used as outputs; when the 
data store is being read, P%DR_ECC_H<7 :0> are used as inputs. 

Each one of P%DR_ECC_H<7:0> is connected to one RAM I/O pin. 
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13.3.2 Backup Cache Block Diagrams 

Figure 13—4 and Figure 13—5 show the connections to the tag store and data RAMs and the way 
the address is used for the 128-kilobyte cache. 

Figure 13-4: Tags and Data for 128-Kllobyte Cache 



P%TS INDEX H<16:5> 



P%TS OE L 



P%TS WE L 



TAG STORE 

6 PARTS, 4K X 4 



_ 

P%TS TAG H<31:h7> 



7Ts 7K 7fs7 

P%TS_QWNED H 
P%TS 



VALID H 

P%TS ECC H<5:0> 



P%DR INDEX H<16:3> 



P%DR OE L 



P%DR WE L 



DATA RAMS 

18 PARTS , 16K X 4 



"7K 

P%DR_DATA_H<63 :0> 



P%DR ECC H<7:0> 



DIGITAL CONFIDENTIAL 



TheCbox 13-13 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



Figure 13-5: Address as used for 128-Kilobyte Cache 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 1 11 10 OS 08 1 07 06 05 04 1 03 02 01 00 

(- + H 1— — I h + + + +--+ -i + -i + + + + H + + (- + + + +~ +— ^ + H +— H + 

I tag - 15 bits | data and tag store Index - 12 bits| I UNUSED I 

I 

used to address data guadword within hexaword--' 
unused for tag store 
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Figure 13-6 and Figure 13—7 show the connections to the tag store and data RAMs and the way 
the address is used for the 256-kilobyte cache. 

Figure 13-6: Tags and Data for 256-Kilobyte Cache 
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Figure 13-7: Address as used for 256-Kilobyte Cache 



31 30 29 28|27 26 25 24123 22 21 20119 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04 1 03 02 01 00 

+— + +-_+—+ +__+__+__+„+— +— -H + +— ■+— H H + +--+—■+—+ 

I tag - 14 bits | data and tag store index - 13 bits | | UNUSED I 

+ + 4 + H +--+—-+--+ H + H 1— H + + +-— + +~+ +~+ + + + H + H + -I +--+--+ 

I 

used to address data guadword within heataword — ' 
unused for tag store 
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Figure 13-8 and Figure 13-9 show the connections to the tag store and data RAMs and the way 
the address is used for the 512-kilobyte cache. 

Figure 13-8: Tags and Data for 512-Kilobyte Cache 
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Figure 13-9: Address as used for 512-Kilobyte Cache 



31 30 29 28127 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04 | 03 02 01 00 

h + + + + + + 4 + + + +~+ -I +~+ 4 + 4 + -I + +~ + 4 +--+«—+ + + + -I + 

1 tag - 13 bits | data and tag store index - 14 bits I I UNUSED I 

(- + 1 + H + + 4 +—+--+ 4 + 4 + +—+--+ +— -+ +"4—4 + +~H + 4 + 4 + -I + 

I 

used to address data guadword within hexaword — ' 
unused for tag store 
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Figure 13-10 and Figure 13-11 show the connections to the tags and data RAMs and the way the 
address is used for the 2-megabyte cache. 

Figure 13-10: Tags and Data for 2-Megabyte Cache 
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Figure 13-11 : Address as used for 2-Megabyte Cache 



31 30 29 28127 26 25 24|23 22 21 20119 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04 | 03 02 01 00 

-i I h h I I- + ^ H I I- I I -I I V I I •* h t- + H + i H I I I I h (—-+ 

1 tag - 11 bits | data and tag store index - 16 bits I | UNUSED I 

I 

used to address data quadword within hexaword— ' 
unused for tag store 
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13.4 The Cbox Datapath 

The Cbox includes datapath and control for interfacing to the Mbox, the cache RAMs, and to the 
NDAL. The portion of the Cbox which primarily interfaces to the Mbox and the cache RAMs will 
be referred to here as the Cbox proper, while the portion of the Cbox which primarily interfaces 
to the NDAL will be referred to as the BIU. 

The Cbox datapath is organized around a number of queues and latches, an address bus and a 
data bus in the Cbox proper, and an address bus and a data bus in the BIU. Separate access is 
provided to the tag store and the data RAMs. 

Table 13—10 lists the Cbox queues and the major latches. Each is covered in more detail later in 
the section. The IPRs are not covered here, as they are covered in Section 13.5. 



Table 13-10: Cbox Queues and Major Latches 



Queue/Latch 



Entries Address/Data 



Function 



CM_OUT_LATCH 

FILL_DATA_PIPEs 
DREAD_LATCH 

IREAD.LATCH 

WRITE_PACKER 

WRITE.QUEUE 

FILL CAM 



NDAL_IN_QUEUE 
WRITEBACK_QUEUE 



10 

2 



NON_WRITEBACK_QUEUE 2 



Address<31:3> and 
data<63:0> 

Dana<63:0> 

Address<31:0> 

Address<31:0> 

Address<31:0> and 
data<63:0> 

Address<31.-0> and 
data<63:0> 

Address<31:3> 



Address<31:5> or 
data<63:0> 

Address <31:3> and 
data<63.-0> times 4 



Address<31:0> and 
data<63K)> 



Holds fill data or an invalidate address 
being sent to the Mbox. 

Pipeline data destined for the Mbox 

Holds a data-stream read request from 
the Mbox. 

Holds an instruction-stream read request 
from the Mbox. 

Compresses sequential memory writes to 
the same quadword. 

Queues write requests from the Mbox. 

Holds addresses for read or write misses 
which have resulted in a read to 
memory; one may hold the address of an 
in-progress DREAD_LOCK which has no 
memory request outstanding. 

Holds up to 8 quadword fills and up to 2 
coherence transactions from the NDAL. 

Holds writeback addresses and data to be 
driven on the NDAL. The queue holds up 
to 2 hexaword writebacks. It is also used 
for quadword WDISOWNs. 

The NONJWKITEBACK^QUEUE holds 
all non-WDISOWN transactions destined 
for the NDAL. This includes reads, I/O 
space transactions, and normal writes 
which are done when the cache is off or 
in ETM. 



It can be seen from Table 13—10 that some of the queues contain address and data entries 
in parallel (CM_OUT_LATCH, WRITE_PACKER, WRITE.QUEUE, WRITEBACK.QUEUE, 
NON_WRITEBACK_QUEUE), some contain either addresses or data (NDAL_IN_QUEUE), some 



13-18 The Cbox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



contain only data (FILL_DAIA_PIPE), and some contain only addresses (DREADJLATCH, 
IREAD.LATCH, FILL.CAM). 

The Cbox is organized around an address datapath and a data datapath. A block diagram of the 
data datapath is given in Figure 13-12, and a block diagram of the address datapath is given in 
Figure 13-13. 

There are five major busses in the Cbox: C_BUS%DBUS_H<63:0>, C_BUS%BIU_DAIA_H<63K>>, 
C_ADC%ABUS_H<31K)>, C_ADC%BIU_ADDR_OUT_H<31K>> and C_BIU%ADC_ADDR_IN_H<31K>>. The first 
two transfer data, and the last three transfer addresses. From the block diagrams, it can be seen 
which of the latches and queues are connected to which busses. Transfers between address and 
data are connected only through the Abus/Dbus Xfer block, which is in the BIU. 

The data flows may be understood by examining Figure 13—12. Write data enters the Cbox 
through the WRITE_QUEUE and is written into the data RAMs. When a writeback of a block 
occurs, data is read out of the data RAMs, transferred to the WRITEBACK_QUEUE in the BIU, 
and is driven onto the NDAL. 

When read data is read from the backup cache, it is sent to the Mbox through the 
CM_OUT_LATCH. When read data returns from memory, it enters the Cbox through the 
NDAL_IN_QUEUE, is driven across C_BUS%BIU_DAIA w H<63.-o> to C_BUS%DBUS_H<63«> and into 
the data RAMs, as well as to the Mbox through the CM_OUT_LATCH. 

When the Bcache is off, write data is sent from the WRITE_QUEUE directly to the 
NON_WRITEBACK_QUElTE and to memory, bypassing the cache entirely. 

The last data flow of significance has to do with the reading and writing of IPRs. The Dbus IPRs 
and the NDAL IPRs are read and written directly from the data datapath. 

The address flows may be understood by examining Figure 13-13. Address bits <31:3> are used 
for memory space reads and writes, which always address a quadword boundary. Address bits 
<31:0> are used for I/O space reads and writes, which may address individual bytes. 

Read addresses arrive through the IREAD_LATCH and the DREADJLATCH, and write addresses 
arrive via the WRITE_QUEUE . Each address is driven across C_ADC%ABUS_H<3iKh> to the tag 
RAMs, where it is looked up so that hit may be calculated. The index portion of the address is 
also driven to the data RAMs in case of a hit. 

If a read or a write results in a hit, the data is sent back to the Mbox via the CM_OUT_LATCH. 
The requested quadword is always sent first on a Bcache hit. Bits <4:3> are driven onto 
C<r<MBOX_FILL_QW_H<4;3> to enable the Mbox to distinguish between quadwords within a 
hexaword. The most significant bits are not driven for fill data, as the Mbox knows from its 
miss latches and the fill command (D_CF or I_CF) which hexaword address the data corresponds 
to. 

If the read or write does not result in a Bcache hit, the miss address is loaded into the FILL_CAM, 
which holds addresses of outstanding read and write misses; the address is also driven to the 
BIU, where it enters the NON_WRTTEBACK_QUEUE to be driven onto the NDAL. When the 
fill data returns, the value of the NDAL signal P%ED_H<0> is used to locate the correct one of 
the two addresses in the FILL_CAM so that the data RAMs and the tag RAMs may be written. 
The address is driven out of the FILL_CAM to index the tag and data RAMs. 
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Another address-type operation occurs when a cache coherency transaction appears on the NDAL. 
In this case, the address comes in through the NDAL_IN_QUEUE and is driven from the BIU 
to the CBOX proper through the CB OX_BIU_INTERFACE . The address is looked up in the tag 
RAMs, and if it hits, the address is sent through the CM_OUT_LATCH to the Mbox for a Pcache 
invalidate. If necessary, the VALID and/or OWNED bit is cleared for the Bcache entry. Only 
address bits <31:5> are used for invalidates, as the invalidate is always to a hexaword. 

If a writeback is required, the index is driven to the data RAMs so the data can be read out. The 
address is then driven to the WRITEBACK.QUEUE for the writeback; it is followed shortly by 
the writeback data on the data busses. 

When Abus IPRs are read or written, the address busses and the data busses come into 
play. When an Abus IPR is read, the data is driven onto C_ADC%ABUS_H<3 1:0> and then to 
C_ADC%BIU_ADDR_OUT_H<3 1K>>. The BIU uses the Abus/Dbus XFER block to transfer the data to 
C_BUS9cBIU_DATA w H<63K)>; it then goes to C_BUS%DBUS_H<63:0> and back to the Mbox through the 
CM_OUT_LATCH. 

When an Abus IPR is written, the data is driven from the Mbox through the WRITE_QUE UE , 
to C_BUS%DBUS_H<63K)>, and to C_BUS%BIU_DATA w H<63.-o> . The Abus/Dbus XFER block transfers 
the data to C_ADC^IU_ADDR_OUT_H<3 1:0> , and it is then driven to C_ADC%ABUS_H<31:0> so that 
it can be written into the register. 

The byte mask is received from the Mbox for writes and I/O space reads. It is passed through the 
Cbox and onto the NDAL for writes when the cache is off or in ETM, and it is passed through to 
the NDAL for all I/O space transactions. 
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Figure 13-12: Cbox block diagram with DATAJBUS 
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Figure 13-13: Cbox block diagram with ADDRESS_B US 
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1 3.4.1 Mbox Interface 

All NVAX CPU chip transactions for the Cbox arrive through the Cbox-Mbox interface. Reads 
come from the Mbox to the Cbox through the read latches. Writes arrive through the 
WRITEJPACKER and the WRITE_QUEUE. All fills returning from the Cbox to the Mbox go 
through the CM_OUT_LATCH. 

A block diagram of the Mbox interface is shown in Figure 13—14. 
Figure 13-14: Mbox Interface 
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When the Mbox has a command for the Cbox, the command appears on M%S6_CMD_H<4K>> . 
M%CBOX_REF_ENABLE_L is asserted for all reads, IPR_RDs, and IPR.WHs. It is not asserted 
for writes since the Cbox accepts all writes from the Mbox. The Cbox loads the address from 
M%SG_EA^H<3is3> and M%C_S6_PAl_H<2«> into either the IKEADJLATCH, the DREAD_LATCH, or 
the WRITE_PACKER. If the command is a write, the Cbox loads the data from B%S6_DAIA W H and 
the byte enable from M%S6_BYTE_MASK W H into the WRITE_PACKER. 

Table 13-11 shows the commands which pass between the Mbox and the Cbox. 
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Table 13-11: Mbox-Cbox Commands 





Description 


Cbox datapath element involved 


Mbox to Cbox conu 


lands driven on M%se_CMD_H<4*> 




1KEAJJ 


Instruction stream read 


IKK A 1 )_LiAXUrl 


DREAD 1 


Data stream read 


DREAD_LATCH 


DREAD.MODIFY 1 


Data stream read with modify 


DREAD.LATCH 




intent 




DREAD.LOCK 1 


Interlocked data stream read 


DREAD.LATCH 


WRITE.UNLOCK 


Write which releases lock 


WRITE_PACKER , WRITE_QUEUE 


WRITE 


Normal write 


WRITE.PACKER, WRITE.QUEUE 


IPR.RD 1 


Read of an internal or 


DREAD.LATCH 




external processor register 




IPR.WR 1 


Write of an internal or 


WRITE.PACKER, WRITE.QUEUE 




external processor register 




Cbox to Mbox commands driven on c%cbox_cmd_h<ia> 


D.CF 


Data stream cache fill 


CM.OUT.LATCH 


I_CF 


Instruction stream cache fill 


CM.OUT.LATCH 


INVAL 


Hexaword invalidate 


CM.OUT.LATCH 


NOP 


No operation. 





1 Qualified by m%cbox^kef_knable_l. 
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13.4.1.1 Mbox to Cbox Transactions 

The Mbox commands and accompanying control and data signals are shown in Table 13-12. 
M%CBOX_REF_ENABLE_L and M%CBOX_LATE_EN_H are used to enable certain transactions coming 
to the Cbox; M%CBOX w LAI'E_EN_H is only used for transactions which may hit in the Pcache. From 
the table, is may be seen that the assertion of M%CBOX_REF_ENABLE_L is not necessary for writes 
and write unlocks; and that M%CBOX_LATE_EN_H is only used for DREADs, IREADs, and READ 
MODIFYs. M%S6_BYTE_MASK W H<7K»> is valid for all transactions, although B%S6_DATA W H<63:0> is 
not valid for read transactions. 

Table 13-12: Mbox to Cbox Command Matrix 

Mbox-drivert Signal or Bus 

M%S6_CMD_H<4iO> M%CBOX_RKT_KNABUE_L M%S6_PA_H<31.3> M%S6_BYTE_MASK_H<7iO> 







M*CBOXJLAI*E_EN_H 


M<»c_a 


>_PA_H<2i0> 


B%SS_DAXA u H«83iO> 


DREAD 


valid 1 


valid 


valid 


valid 


X 2 


READ 
MODIFY 


valid 


valid 


valid 


valid 


X 


IREAD 


valid 


valid 


valid 


valid 


X 


READ 
LOCK 


valid 


0 s 


valid 


valid 


X 


IPR READ 


valid 


0 


valid 


valid 


X 


IPR 
WRITE 


valid 


0 


valid 


valid 


valid 


WRITE 


X 


X 


valid 


valid 


valid 


WRITE 
UNLOCK 


X 


X 


valid 


valid 


valid 


OTHER 


X 


X 


X 


X 


X 



1 "valid" denotes that the Bignal is either asserted or deasserted by the Mbox, and the Cbox interprets it appropriately. 
2 "X" denotes that the Mbox may drive any value to the Cbox, and the Cbox does not care what value is driven. 
s "0" denotes that the Mbox never asserts the signal in this case. 



1 3.4.1 .1 .1 The IREADJLATCH and the DREAD_LATCH 

When the Mbox has a read command for the Cbox, the Cbox loads the address from from 
M%S6_PA W H<31:3> and M%C_SG_PA_H<2sO> into either the IREADJLATCH or the DREAD JLATCH, 
depending on the command. Only IREADs are loaded into the IREAD_LATCH. The 
DREAD.LATCH is used for DREAD, DREAD.MODIFY, DREAD.LOCK, and IPR.READ. 

The Mbox only has one outstanding IREAD and one outstanding DREAD at a time, so no 
backpressure for the latches is needed. When the DREAD.LATCH is valid, the Mbox does 
not start the next DREAD-type transaction until all fill data from the previous command is 
returned to the Mbox. When the IREADJLATCH is valid, the Mbox does not start the next 
IREAD transaction until either the IREAD has been aborted or all fill data from the IREAD is 
returned to the Mbox. 
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The Cbox services a read hit from the read latch; a read miss is transferred to the FILL_CAM 
where it awaits the arrival of data from memory. Table 13-13 and Table 13—14 show the fields 
which are contained in the two read latches. 



Table 13-13: IREAD LATCH Fields 



Field 


Purpose 


ADDRESS<31:0> 


Physical address of the read request. 


CMD<4:0> 


Specific command being done (IREAD). 


Table 13-14: DREAD. 


.LATCH Fields 


Field 


Purpose 


CMD<4:0> 


Specific command being done (DREAD, DREADJMODIFY, DREAD_LOCK, 




IPRJREAD). 


ADDRESS<31:0> 


Physical address of the read request. 



When the Mbox asserts M%ABORT_CBOX.IRD_H, the Cbox clears the IREAD_IATCH entry if the 
reference has not yet started. If the CBOX is in the middle of the tag store lookup or in the middle 
of a hit sequence and returning the Iread fill data, it aborts the lookup or the data sequence. If a 
miss has already been initiated, the CBOX continues with the fills to the backup cache but does 
not send any data to the Mbox. 



1 3.4.1 .1 .2 WRITE__PACKER and WRITE_QUEUE 

Writes from the Mbox go through the WRITE_PACKER and into the WRITE_QUEUE. The 
WRITE_PACKER holds one quadword of data; the WRITE_QUEUE consists of 8 entries, each 
of which contains a quadword of data. The purpose of the WRITE_PACKER is to accumulate 
memory-space writes to the same quadword which arrive sequentially, so that only one write has 
to be done into the cache. Performance modelling shows that this can reduce by 70% the number 
of writes done to the backup cache. 

Only normal WRITE commands to the same quadword are packed together. Other writes 
pass immediately from the WRITE.PACKER into the WRJTE_QUEUE. The WRITE_PACKER 
is flushed at the following times: 

• When a memory-space WRITE to a different quadword arrives. The new quadword then 
remains in the write packer until a write packer flush condition is met. 

• When a WRITE_UNLOCK arrives. The WRITE_UNLOCK is then passed immediately from 
the WRITE_PACKER to the WRITE.QUEUE. 

• When an I/O space write arrives. The I/O space write is then passed immediately from the 
WRITE JPACKER to the WRITE_QUEUE. 

• When an IPR.WRITE arrives. The IPRJWRITE is then passed immediately from the 
WRITE JPACKER to the WRITE_QUEUE. 

• If an IREAD or a DREAD arrives to the same hexaword as that of the entry in the 
| WRITE_PACKER. 
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• Whenever any condition for flushing the write queue is met on the entry in the 
WRITE JPACKER. 

• If the DISABLE_PACK bit in the CCTL IPR is set. In this case, every write passes directly 
through the WRITE_PACKER without delay. 

THREE-CYCLE LATENCY THROUGH THE WRITE_QUEUE 

If the WRITE_QUEUE and the WRITE_PACKER are empty, the latency of any write 
through them is 3 cycles. The implication of this is that if any reads which flush 
the WRITE_QUEUE are done alternately with writes, their execution will be greatly 
slowed. This applies to IPR reads and writes and may be an issue in testing the chip. 

Table 13-15 describes the fields in the WRITE_ QUEUE. 



Table 13-15: WRITE QUEUE Fields 



Field 


Purpose 


VALID 


Indicates that the entry contains valid information. 


DWE.CONFLICT 


Indicates that this write conflicts with a DREAD, giving the WRITE.QUEUE 
priority. Check is done using hexaword address. 


IWR.CONFLICT 


Indicates that this write conflicts with an IREAD, giving the WRITE_QUEUE 
priority. Check is done using hexaword address. 


CMD<4:0> 


Specific command being done. 


ADDRESS<31:0> 


Physical address of the write. 


BYTEJEN<7:0> 


Byte enable for the write. 


DATA<63:0> 


Data to be written. 



When a quadword of data is moved into the WRITE_QUEUE, it is serviced by the Cbox arbiter 
as the lowest-priority task, unless special conditions exist. 

Servicing writes separately from reads allows reads to take higher priority and gets read data 
back to the CPU faster. However, a read which follows a write to the same hexaword must 
not be allowed to complete before the write completes. To prevent this there are conflict bits, 
DWR_CONFLICT<8:0> and IWR_CONFLICT<8:0>, implemented in the WRITE_QUEUE and 
WRITE_PACKER, one for each entry. The conflict bits ensure correct ordering between writes 
and a DREAD or an IREAD to the same hexaword. 

When a DREAD arrives, the hexaword address is checked against all entries in the 
WRITE_QUEUE and WRITE.PACKER. Any entry with a matching hexaword address has 
its corresponding DWR_CONFLICT bit set. The DWR_CONFLICT bit is also set if the 
WRITE_QUEUE entry is an IPR.WRITE, a WRITEJJNLOCK, or an I/O space write. If any 
DWR_CONFLICT bit is set, the WRITE_QUEUE takes priority over DREADs, allowing the writes 
up to the point of the conflicting write to complete first. 

When an IREAD arrives, the hexaword address is checked against all entries in the 
WRITE_QUEUE and WRITEJPACKER. Any entry with a matching hexaword address has 
its corresponding IWR_CONFLICT bit set. The IWRJ30NFLICT bit is also set if the 
WRITE_QUEUE entry is an IPR_WRITE, a WRITE.UNLOCK, or an I/O space write. If any 
IWR_CONFLICT bit is set, the WRITE.QUEUE takes priority over IREADs, allowing the writes 
up to the point of the conflicting write to complete first. 
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As each write is done, the conflict bits and valid bit of the entry are cleared. When the last write 
which conflicts with a DREAD finishes, there are no more DWR_CONFLICT bits set, and the 
DREAD takes priority again, even if other writes arrived after the DREAD. In this way a DREAD 
which conflicts with previous writes is not done until those writes are done, but once those writes 
are done, the DREAD proceeds. 

The analogous statement is true for an IREAD which has a conflict. If IWR_CONFLICT is set and 
the IREAD is aborted before the conflicting write queue entry is processed, the WRITE_QUEUE 
continues to take precedence over the IREAD_LATCH until the conflicting entry is retired. 

If both a DREAD and an IREAD have a conflict in the WRITE_QUEUE, writes take priority until 
one of the reads no longer has a conflict. If the DREAD no longer has a conflict, the DREAD is 
then done. Then the WRITE_QUEUE continues to have priority over the IREAD_LATCH since 
the IREAD has a conflict, and when the conflicting writes are done, the IREAD may proceed. If 
another DREAD arrives in the meantime, it may be allowed to bypass both the writes and the 
IREAD if it has no conflicts. 

This mechanism is used for other cases to enforce read/write ordering. Cases where the 
WRITE .QUEUE (and the WRITE JPACKER) must be flushed before proceeding are listed below: 

1. DREAD_LOCK and WRITE.UNLOCK. 

2. All IPRJREADs and IPR.WRITEs (includes Clear Write Buffer). 

3. All I/O space reads and I/O space writes. 

4. Dread or Iread conflict with a write (checked to hexaword granularity, on address bits <3 1:5>). 

When a DREAD JLOCK. arrives from the MBOX, DWR_CONFLICT bits for all valid writes in the 
WRITE_QUEUE and WRITE_PACKER are set so that all writes preceding the DREAD_LOCK 
are done before the DREAD JLOCK is done. 

When any IPR_READ arrives, all DWR_CONFLICT bits for valid entries in the WRITE_QUEUE 
and WRITE_PACKER are set, forcing the writes to complete before the IPRJREAD completes. 
This ensures that IPR reads and writes are executed in order. 

When any D-stream I/O space read arrives, all DWR_CONFLICT bits for valid entries in the 
WRITE_QUEUE and WRITE_PACKER are set, so that previous writes complete first. 

When any I-stream I/O space read arrives, all IWR_CONFLICT bits for valid entries in the 
WRITE_QUEUE and WRITE.PACKER are set, so that previous writes complete first. 

Note that when a WRITE_UNLOCK arrives, the WRITE_QUEUE is always empty as it was 
previously flushed before the READ_LOCK was serviced. 

When a new entry for the DREADJLATCH arrives, it is checked for conflicts with the 
WRITE_QUEUE. At this time the DWR_CONFLICT bit is set on any WRITE.QUEUE entry 
which is an I/O space write, an IPR_WRITE, or a WRITE_UNLOCK_ Similarly, when a new 
entry for the IREADJLATCH arrives, it is checked for conflicts with the WRITE_QUEUE. At this 
time the IWR_CONFLICT bit is set on any WRITE_QUEUE entry which is an I/O space write, 
an IPRJWRITE, or a WRITEJJNLOCK 

Thus, all transactions from the Mbox except memory space reads and writes unconditionally 
force the flushing of the WRITE_QUEUE . Memory space reads cause a flush if they conflict with 
a previous write. 

If the WRITE_QUEUE fills up, the Cbox asserts C%WR_BUF_BACK_PRES_H. The Mbox then stops 
sending more writes to the Cbox until C%WR_BUF_BACK W PRES_H is deasserted. 
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13.4.1.2 Cbox to Mbox Transactions 

The Cbox sends fills and invalidates to the Mbox. The signals which the Cbox drives in doing 
this are shown in Table 13—16. 



Table 13-16: Cbox to Mbox interface signals 



Field 



Purpose 



C%CBOX_CMD_H<li0> 

C%CBOXjU>DR.H<31tS> 

C%BEQ_DQWJH 



CScLASTJFTLLuH 
C^rCBOX w HAED_ERR_H 



C*CBOX_ECC_ERR_H 

C^MB03LFHX_QW_H«4i3> 

B%S6_I1A3A_H<8SiO> 
CfcS6_DP_H<7«0> 



Specific command being done: either D_CF, I_CF, INVAL, or NOP. 
Hexaword address for invalidate sent to Mbox 

Indicates that the quadword of fill data being returned was the requested 
quadword of data: the quadword to which the original address corresponded. 
It is also asserted if c%cboxjbard_err_h is asserted and the requested quadword 
has not yet been returned; the Mbox then notifies the Ibox and/or Ebox that 
the requested data has been returned so that the machine does not hang. 

Indicates that this is the last data being sent for the read request. 

Indicates that an unrecoverable error is associated with the data. This bit only 
qualifies fills, not invalidates. When c<*cbox_hahd_err_h is asserted, the Cbox 
also asserts czlastjfilljl as no more fills follow. c%cbox_hard_brr_h may be 
asserted as the result of an uncorrectable error in the Bcache or as the result 
ofRDE on the NDAL. 

Indicates that a correctable backup cache ECC error is associated with the 
current fill data and the data should be ignored. Valid for fills only, not 
invalidates. Corrected data will follow. 

Address bits to indicate to which quadword within the hexaword the current 
fill data belongs. 

Bus used to receive data from the Mbox and to send fill data to the Mbox. 
Byte data parity for bw»jmxa w h«8%o>. 



Table 13-17 shows what signals are driven and valid for every Cbox-to-Mbox transaction. 

If an error in the backup cache or on the NDAL happens while fill data is being retrieved, the 
Cbox notifies the Mbox using C%CBOX^HAKD_EER_H or C%CBOX^ECC.ERR.H. Table 13-18 shows 
how both normal cases and error cases are handled by the Mbox. 
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C%CBOX w CMD_H<H0> 

Cbox-driven Signal 



or Bus 


NOP (00) 


INVAL (01) 


I_CF (10) 


D_CF (11) 


C«SfcCBOX_AI>DR_H<31«6> 


X 1 


valid 2 


X 


X 


C%REQ_DQW_H 


0 s 


0 


valid* 


valid 


C%LAST_FDX_H 


0 


0 


valid 


valid 


C<S>CBOX_HABD_EKR_H 


0 


0 


valid 


valid 


OfcCBOX_ECC_EBR_H 


X 


X 


valid 


valid 


C<*MBOX_FIIJL_QW_H<*3> 


X 


X 


valid 


valid 


B%S6JDATA_H<83iO> 


not driven 


not driven 


driven 


driven 


C^S6_DP_H<7iO> 


not driven 


not driven 


driven 


driven 



"X" denotes that the Cbox may drive any value to the Mbox. and the Mbox does not care what value is driven. 
1 "valid" denotes that the signal is either asserted or deasserted by the Cbox, and the Mbox interprets it appropriately. 
"0" denotes that the Cbox never asserts the signal in this case. 
The Mbox ignores the value driven by the Cbox in this case. 



Table 13-18: Cbox to Mbox commands and resulting Mbox actions 



C%CBOX_CMD_H<H0> 


Qualifiers 1 


Mbox Action 


NOP 


Qualifiers do not apply. 


Take no action. 


I_CF or D_CF 


None asserted. 


Accept fill data for outstanding IREAD or DREAD; 
expect more. 


I_CF or D_CF 


c«LASTjraxjB asserted 


Accept fill data for outstanding IREAD or DREAD; 
expect no more. 


I_CF or D_CF 


C*CBOX_HAKD_KRR^E, 
C%LAST_nU^B 


Perform invalidate, expect no more fills for this 
read. (c%last_fho^h is always asserted when 
c%cbox_bakd.ebb._h is asserted.) 


I_CF or D_CF 


C%OTOX_ECCJKRR_H 


Ignore this fill data, expect fill later. 


I_CF or D_CF 


c«cboxjbccjbrr_p and 

C%LAST_FIL1 < _H 


Ignore this fill data, expect fill later. 


I_CF or D_CF 


o*cbox_bcc_kbk_h and 

C*CBOX_HAKD_KRR_H 


This case never happens, and is disallowed. 


INVAL 


Qualifiers do not apply. 


Perform invalidate. 


INVAL to outstanding fill 


Qualifiers do not apply. 


Perform invalidate, expect fill data. Do not 
validate the data in the Pcache when it returns. 



Qualifiers covered in this table are: C*CBOXJHARD_KKBja, c%LAST_Fnx_H, and c«cbox_bcc_brb_BL 
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13.4.1.2.1 CM OUT LATCH 



The CM_OUT_LATCH holds fill data and invalidate addresses which are destined for the Mbox. 
The Mbox never backpressures the Cbox (it can always receive a command from the Cbox) so a 
queue is not needed. The latch has an address portion and a data portion. The fields are shown 



in lable 13-19. 




Table 13-19: CM__OUT_ 


.LATCH Relds 


Field 


Purpose 


CMD<1:0> 


Specific command being done. 


ADDR<31:5> 


Physical address of the invalidate. This field is not used for fills. 


FILL_QW<4:3> 


Quadword alignment of the fill. This field is not used for invalidates. 


DATA<63:0> 


Fill data. 



The CM_OUT_LATCH is loaded with an invalidate when the backup cache deallocates a valid 
block or when it performs an invalidate due to a cache coherency transaction on the NDAL. The 
CM_OUT_LATCH is loaded with cache fill data when the NDAL returns fill data which was 
requested by the Mbox or when a read request hits in the backup cache. Cbox control ensures 
that both events never happen in the same cycle. 

The command from the CM_OUT_LATCH is driven on C%CBOX_CMD_H<io>. If the command is an 
invalidate, the address is driven on C%CBOX_ADDR_H<3i:5>, and no data is driven to the Mbox. If 
the command is a fill, the quadword alignment is driven on C%MBOXJFTLL_QW_H<4:3>. (The Mbox 
has the hexaword address during these cycles.) Fill data is piped through the FILL_DATA_PIPEs 
and driven on B%S6JDAIAJB<63:0>. The Cbox calculates byte parity on the fill data and drives it 
on C%S6_DP_H<7:0>. 

If an IREAD is in progress in the Cbox and the MBOX asserts M%ABORT_CBOX_IED_H, the Cbox 
prevents any further command, address, or data for that Iread from being driven to the Mbox, 
as described in Section 13.4.1.2.3. 
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13.4.1.2.2 RLL_DATA_PIPE1 and FILL_DATAJPIPE2 

The FILL_DATA_PIPEs are used to pipeline the fill data for two cycles so that the Cbox drives 
B%S6_DAXAJB<63:0> coincidentally with the write-enable of the Pcache. If there is a free cycle on 
B%S6_DATA W H<63K», the Cbox may bypass the fill data from the FILL_DATA_PIPE1 (to achieve a 
one-cycle bypass). This allows the Mbox to return data to the Ibox or the Ebox one cycle early. 
The cache fill to the Pcache is done in the normal cycle, driven from FILLJDATA..PIPE2, even if 
Ebox or Ibox data was bypassed in an earlier cycle. The timing relationships for one cache fill 
are shown in Figure 13—15. 

Figure 13-15: b%S6_data_h<63:0> bypass timing 



©ne-evcle data bvcass data written to Pcache 



cycle 1 | cycle 2 

i i-f-r- i -r-r- r I +++••■+ I i I- I -+-- ~ I ~- 



I cycie 3 I cycle 4 | 



! | B%S6_DAEA_H valid 

! (fcr Pcache fill) 

B^S6_DAIA W H<63K)> valid <-o ks_3US) 
M9-cCBOX_BYPASS_ENABLE_H 
O&CBOX CMD_H 
C9iMBOX.FILL.QW_H<4a> 



In this example, a fill is just arriving in cycle 1, so the Cbox drives C<7cCB OX_CMD_H and 
C%MBOX_FILL_QW_H<4:3> . 

The Mbox drives M%CBOX_BYPASS_ENABLE_H to the Cbox in cycle 2 to indicate that B%S6_DAIAJB 
is free during the current cycle. This causes the Cbox to bypass data from FILL_DATA_PIPE1 
to B%S6_DATA W H to achieve a one-cycle bypass. 

In cycle 3 the Cbox drives the data from FILL_DATA_PIPE2 to the Pcache for the write. It does 
this even though the bypass was done previously, because the Pcache is always written in the 
third cycle after C%CBOX_CMD_H is driven with the fill command. 

The rules for the Cbox driving data on B%S6_DATA W H are as follows: 

1. IF FILL_DATA_PIPE2 contains valid data, drive B%S6J>AIA_H from FILL_DATA_PIPE2 

2. ELSE IF M%CBOX_BYPASS_ENABLE_H is asserted and FILLED AIAJPIPE 1 contains valid data, 
drive from FHXJDAIA^PIPEl to achieve a one-cycle bypass. 

The Mbox keeps enough state to know what the Cbox will be bypassing in any given cycle. 

When the Cbox drives B%S6_DAIA i _H, it also generates byte parity and drives C%S6_DP_H with the 
same timing. 
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The fields of the FILLJDATA.piPEs are shown in Table 13-20. 



Table 13-20: 


Fields of FILL_DATA__PIPE1 and FILL_pATA__PIPE2 


Field 


Purpose 


IREAD 


Indicates that fill data is for an IREAD. 


DATA<63:0> 


Fill data. 



The IREAD field is necessary in case of an IREAD abort, as described in Section 13.4.1.2.3. 
If M%ABORT.CBOX w IRD_H is asserted and the data in either FILL_DATA_PIPE1 or 
FILL_DATA_PIPE2 is for an IREAD, that FILL_DATA_PIPE must be cleared so that data is 
not driven back to the Mbox. 
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13.4.1.2.3 IREAD Aborts 

The Mbox asserts the signal M%ABORT_CBOX_IRD_H to notify the Cbox to abort any IREAD which 
it is currently processing. This may happen because of a branch mispredict where the Istream 
has been prefetching from one branch and has to change over to the other. The Mbox then aborts 
all outstanding IREADs so that a new IREAD can begin. 

When the Cbox receives the abort signal, the read in question may be anywhere in the Cbox read 
sequence. The exact action taken depends on where the read is, as shown in Table 13—21. 



Table 13-21: Cbox Action Upon Receiving m%abort_cbox_ird_h 



State of the IREAD 



Action Taken by the Cbox 



No IREAD outstanding 

IREADJLATCH valid 
but not started 

IREAD.LATCH 

valid and hit calculation 

in progress 

IREAD.LATCH valid 
and read hit in progress 

IREAD valid in 
FILL.CAM 

IREAD fill data 
in CM_OUT_LATCH or 
FILL_DATA_PIPEs 



No action taken. 

Clear the IREAD.LATCH so the request will not be started. 

Abort the hit calculation immediately. This frees the tag store and data RAMs 
for another request. 

Abort the data RAM sequence immediately. The tag store and data RAMs are 
freed up for another request. 

Clear the TO_MBOX bit in the FILL_CAM entry. When the fill data returns 
from memory, validate it in the backup cache but don't send the data to the 
Mbox. 

Clear the entry containing IREAD data so that the data is not returned to the 
Mbox. 



Figure 13—16 shows an example of timing for the Cbox abort response. In cycle 1, 
M%ABORT_CBOX_IRD_H is asserted during phase 2. The Cbox is ready to drive the I_CF command 
and B%S6_DATA_H during phase 4. The assertion of M%ABORT_CBOXJffiD_H prevents both of those 
actions. 

The next IREAD may appear two cycles after the abort. 



Figure 13-16: m%abort_cboxjrd_h Timing 



I cycle 1 | cycle 2 I cycle 3 I 

I +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | +++++ | 

I I I 1 

III I 

I | I Mbox may send next IREAD 

I | B%S6_DATA_H for I_CF not driven due to abort 

I C%CBOX_CMD_H-I CF not driven due to abort 

M%ABORT_CBOX_IRD_H 
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13.4.2 ECC Datapaths 1 

The backup cache tag store and data store are both protected by error-detect-and-correct codes 
(ECC). ECC was chosen for its capability to correct errors because the cache is writeback and 
may contain the only copy of data in the system. 

The codes employed detect and/or correct the following errors: 

1. Detect and correct single-bit errors. 

2. Detect double-bit errors. 

3. Detect three and four bit failures if within one nibble. 

4. Detect some addressing failures. 

5. Detect all-zero's failure on all protected bits. 

6. Detect all-one's failure on all protected bits. 

In general, ECC works as follows: Some number of check bits are generated. Each check bit is 
parity calculated over some subset of the data bits to be protected. The data bits and the check 
bits together are known as a code word. 

When data is written, the check bits are calculated and stored with the data; when data is read 
the check bits are regenerated and compared against the stored check bits. The result of the 
comparison is called the syndrome; if it is all zeros there is no error. The syndrome is passed 
through the syndrome decoder, which decodes one of N states. Each of the N states corresponds 
to one of the data or check bits being protected by ECC. 

If the syndrome does not decode successfully, the error is recognized as uncorrectable. If it does 
decode successfully, the output of the decoder indicates which bit is in error and that bit is inverted 
to achieve the data correction. 

13.4.2.1 Backup Cache Tag Store ECC 

Figure 13-17 shows a block diagram of the ECC datapath for tag store ECC. 
P%TS_TAG_H<3 1 :1 7> , P%TS_OWNED_H, and P%TS_VALID_H are protected directly by 
ECC. 

When the tag store is written, the generated check bits are written into the RAMs with the tag, 
valid and owned bits. When the tag store is read, the check bits are regenerated on the stored 
tag, valid and owned bits and compared with the stored check bits. The result of the comparison 
is the syndrome, which decodes to tell the hardware which bit is in error. 



1 Bee Steve EUdnd's memo of 31 January 1989, ECC Codes for NVAX Bcache, for more detail about the codes chosen. 
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Figure 13-17: Tag Store ECC Block Diagram 



NOTE: EACH PARITY TREE HAS DIFFERENT SUBSETS OF DATA LINES AS INPUTS; 
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A failure in addressing the RAMs is covered indirectly in the following way: When an 
entry is written into the tag store, even parity is generated on the on-chip version of 
P%TS_INDEX_H<20:5>. This is the address parity bit. (Those bits of P%TSJLNDEX_H<20:17> 
which are not required to address the RAMs, based on cache^si^e selection, are zero'd during 
parity generation.) The address parity bit, P%TSJTAGJB^0:5£, P%TS_OWNED_H, and 
P%TS_ VAT JD_H are all used in generating the check bits to be stored. The address parity 
bit itself is not actually stored. L% lj 1 

When an entry is read from the tag store, parity on P%TSJLNDEX_H<20:5> is recalculated and 
used in the regeneration of the check bits, which are then compared with the stored check bits. 
If there was an addressing failure in either reading or writing the RAMs, and the regenerated 
check bits do not match the stored check bits, the output of the syndrome decoder indicates that 
the address bit is in error. Addressing failures are only detected if the failure was such that 
incorrect parity is produced from the address. 
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"the ECC datapath makes a "predictive" ECC possible which is used in the hit calculation. While 
the tag RAMs are being accessed, the six predictive ECC check bits are calculated on the expected 
tag, valid, and owned bits. This predictive ECC is then compared with the actual ECC check bits 
read from the TAG RAMs during the hit calculation. In this way, an ECC error prevents a cache 
hit, so that a hit is never detected and then rescinded due to an error. 

The code used for tag ECC is shown in Figure 13-18. The check bit which is marked with a "1" in 
each row is generated by a parity tree whose inputs are the Tag, Valid, Owned, and AP (address 
parity) bits which are marked with a "1" in that row. 



Figure 13-18: Tag Store Error Correcting Code Matrix 
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In a tag store read operation, a non-zero syndrome indicates an error. If the syndrome generated 
matches one of the columns in the matrix, the error is correctable and the matching column 
indicates the bit to be corrected. For example, if syndrome<5:0> equals OlllOO(BIN), then tag bit 
<31> must be inverted to correct the problem. Any syndrome value which is non-zero and does 
not match a column in the matrix indicates an uncorrectable error. 

This code has the property that if any three or four bits in one nibble are in error, the syndrome 
produced will not match any matrix column. This means that an uncorrectable error will be 
nagged for a single 4-bit-wide RAM failure. It does not necessarily protect against single RAM 
failures if 8-bit-wide RAMs are used. 



NOTE 

Nibble protection only works if the bits in each nibble shown in the matrix are 
physically stored in the same RAM chip. The board designer must ensure that this is 
the case. 



DIGITAL CONFIDENTIAL 



TheCbox 13-37 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



Odd parity is used for check bits 1 and 4 to protect against the all-zeros failure mode. Otherwise, 
all-zeros would be a valid code word. The choice of odd and even parity bits prevents all-ones 
from being a valid code word as well. 

13.4.2.2 Backup Cache Data Store ECC 

Figure 13-19 shows a block diagram of the ECC datapath for data ram ECC. 
P%DR_DATA_H<63:0> are protected directly by ECC. Address failure is covered indirectly in 
the same manner as it is covered on the tag store. When data is written into the data RAMs, 
parity is generated on the on-chip version of P%DR_INDEX_H<20:3> and used as an additional 
data bit in generating the check bits to be stored. The address parity bit is not actually stored. 
When an entry is read from the data RAMs, parity on P%DR_INDEX_H<20^> is recalculated 
and used in the regeneration of the check bits, which are then compared (XOR'd) with the stored 
check bits to produce the syndrome for the transaction. (If a cache size is selected which does not 
use some or all of P%DR_INDEX_H<20:17>, those bits are zero'ed during the parity calculation.) 
In many cases an address failure is detected because the check bits will not match and an error 
is flagged. 

The syndrome is used to calculate whether there was an error, and if so, and it was a correctable 
error, the syndrome tells which bit needs to be corrected. 

The code used for data ECC is shown in Figure 13—20. The check bit (C) which is marked with 
a "1" in each row is generated by a parity tree whose inputs are the data bits marked with a n l" 
in that row. 

As in tag store ECC, any syndrome value which is non-zero and does not match a column in 
the table indicates an uncorrectable error. A correctable error is indicated when the syndrome 
matches a column in the table. For example, data bit <44> must be inverted to correct the error 
if syndrome<7:0> equals lOOOOOll(BIN). 

This code has the property that if any three or four bits in one nibble are in error, the syndrome 
produced will not match any matrix column. This means that an uncorrectable error will be 
flagged for a single 4-bit-wide RAM failure. 

NOTE 

Nibble protection only works if the bits in each 4-bit nibble shown in the matrix are 
physically stored in the same RAM chip. The board designer must ensure that this is 
the case. If x8 RAMs are used, the failure of an entire RAM chip is not protected by 
the code. 

Odd parity is used in check bits 3 and 7 to prevent all-ones and all-zeros from being valid code 
words. 
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Data RAM ECC Block Diagram 
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Figure 13-20: Backup Cache Data Store Error Correcting Code Matrix 
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13.4.3 TheBIU 

The BIU contains the NDAL pads, the NDAL_IN_QUEUE, the WRITEBACK_QUEUE , the 
NON_WRITEBACK_QUEUE, the BIU IPRs, and timeout counters for outstanding reads. The 
pads are run on the NDAL clocks, while the rest of the BIU is run on the NVAX internal clocks. 

The BIU IPRs are described in Section 13.5; the rest of the BIU is described here. 

13.4.3.1 NDAL_IN_QUEUE 

The NDAL_IN_QUEUE receives fill data and cache coherency requests from the NDAL. It consists 
of 8 quadword entries for fill data and two entries for cache coherency addresses. Queue control 
ensures that each entry is processed in the order in which it was received, so that fills and 
coherency requests are always processed in order. 

The BIU also uses the NDAL_EN_QUEUE mechanisms to inform the FILL.CAM that a read 
transaction was not acknowledged or timed out before the fill data returned. 

The 8 fill data slots ensure that there is always room in the queue for CPU fill data being returned 
from memory. 

The two cache coherency slots are managed through the assertion of P%CPU_SUPPRESS_L. 
The BIU asserts P%CPU_SUPPRESS_L on the NDAL to prevent the cache coherency slots 
from overflowing. When one slot fills, the BIU must assert P%CPU_SUPPRESS_L immediately 
because the next NDAL cycle may be another cache coherency cycle, which would fill both queue 
slots. This means that two cache coherency commands may be received only if they are on 
back-to-back cycles; if only one is received, P%CPU_SUPPKESS_L is asserted until that one is 
handled by the Cbox. This should happen quickly since the NDAL_IN_QUEUE is serviced by the 
Cbox as the highest priority task. 

The BIU deasserts P%CPU_SUPPRESSJL when it is able to accept more cache coherency 
commands. Note that fill data may always return, whether or not P%CPU_SUPPRESS_L is 
asserted, as there is always room in the queue for fill data. 

The ND AL_IN_QUE UE is loaded with a valid entry to be processed by the Cbox (1) whenever 
there is a valid memory address cycle on the NDAL, where P%ED_H<2:1> is not equal to the 
NVAX ID, and which is accompanied by one of the following commands: I READ, DREAD, OREAD, 
or WRITE (cache coherency cycles); (2) whenever there is a Read Data Return or Read Data Error 
cycle on the NDAL and P%JDJE<2:1> indicates that it belongs to the CPU; (3) when the BIU 
detects NACK for an outgoing read; (4) when a read transaction times out before data is returned. 

The fields of the two portions of the NDAL_IN_QUEUE are shown in Table 13-22. | 
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Table 13-22: NDAL IN QUEUE Fields 



Field 


Purpose 


Fill entries 


VALID 


Indicates that the entry contains valid information. 


DATA<63:0> 


Fill data being returned. 


Cache coherency entries 


VALID 


Indicates that the entry contains valid information. 


ADDRESS<31:5> 


The address of the cache coherency request. 


When the BIU sends a 


transaction from the NDAL_IN_QUEUE to the Cbox proper, it is 


accompanied by one of the commands shown in Table 13-23. 


Table 1 3-23: BIU commands sent to Cbox proper 


Command name 


Meaning 


C_BIU%%NOP_CMD 


No operation. 


C_BIU%%FILL_0_CMD 


Fill for FILL.CAM entry 0. 


C_BIU%%FILL_1_CMD 


Fill for FILL.CAM entry 1. 


C_BIU%%RDE_0_CMD 


Read Data Error for FILL_CAM entry 0. 


C_BKJ%%RDE_1_CMD 


Read Data Error for FILL_CAM entry 1. 


C_BIU%%NACK_0_CMD 


No NDAL acknowledgement received for read from FILL_CAM entry 0. 


C_BIU%%NACK_1_CMD 


No NDAL acknowledgement received for read from FILL_CAM entry 1. 


C_BIU%%TIMO_0_CMD 


Read from FILL_CAM entry 0 has timed out. 


C_BIU%%TIMO_l_CMD 


Read from FILL_CAM entry 1 has timed out. 


C_BIU%%INVAL_R_CMD 


Cache coherency request resulting from a DREAD or an IREAD on the NDAL. 


C_BIU%%INVAL_0_CMD 


Cache coherency request resulting from an OREAD or a WRITE on the NDAL. 



No address is returned for fills, as the NDAL P%ID_H<0> which is returned tells the Cbox which 
FILL_CAM entry was used for the read address. This information is encoded in the commands 
in Table 13-23. The Cbox uses the backup cache index from the FILL_CAM to write the correct 
locations in the tag store and data RAMs. 

There are four separate NDAL Read Data Return commands to allow the Cbox to identify the 
quadwords within the hexaword as they return. The lower two bits of the NDAL command 
are encoded to represent bits <4:3> of a quadword address. The BIU passes these bits to the 
CBOX_BIU_INTERFACE, which drives them onto CJO>C%ABUS_.H<4:3> when the data is driven 
onto C_BUS%DBUS_H<63:0>. The information is then driven to the Bcache and to the Mbox. In this 
way the correct quadword cache entry is written in both caches. 
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1 3.4.3.2 NON_WRITEB ACK_QUEUE 

All outgoing commands except disown writes pass through the NON_WRITEBACK_QUEUE . 
When the backup cache is on, the NON_WRITEBACK_QUEUE contains read misses, OREADs 
due to write misses, and I/O space reads and writes. When the backup cache is off, all transactions 
except quadword disown writes (which result from WRITE_UNLOCKs) go out through the 
NON_WRITEBACK_QUEUE. 

The NON_WRITEBACK_QUEUE has two entries. The fields of each entry in the queue are 
shown in Table 13-24. 

Table 13-24: NON_WRITEBACK_QUEUE Fields 

Field Purpose 

VALID Indicates that the entry contains valid information. 

CMD<3:0> Specific command being done. 

IDO Identification, driven onto P%ID_H<0>, for outgoing reads only. 

ADDRESS<31:0> Address of the outgoing command. 

LENGTH<63:62> Length of the outgoing command. 

BYTE_ENABLE<47:40> Byte enable. 

DATA<63:0> Data, used if the outgoing command is a write. 



The format of the address field corresponds to that of an address cycle on the NDAL, which is 
described in Section 3.3.4.1. 

Writes from this queue are always byte-enabled quadword writes whether to memory space or 
I/O space. 

The NON_WRITEBACK_QUEUE has a backpressure signal so that when it gets full, the Cbox 
stalls transactions from the Mbox until there is room in the queue to proceed. Fills and cache 
coherency transactions continue normally. 



13.4.3.3 WRITEBACK_QUEUE 

The WRITEBACK_QUEUE holds addresses and data for write disowns to memory. It contains 
two entries, each consisting of address and data for either a hexaword or a quadword disown 
write. 

Table 13-25 shows the fields in the WRITEBACK_QUEUE. 



Table 13-25: WRITEBACK_QUEUE Fields 

Field Purpose 

VALID Indicates that the entry contains valid information. 

CMD<3:0> Specific command being done. 

ADDRESS<63:0> Address cycle for the writeback 

DATA0<63:0> First quadword of writeback data. 
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Table 13-25 (Cont.): 


WRITEBACK QUEUE Fields 


Field 


Purpose 


DATA1<63:0> 


Second quadword of writeback data. 


DATA2<63:0> 


Third quadword of writeback data. 


DATA3<63:0> 


Fourth quadword of writeback data. 


BYTE_ENABLE<7:0> 


Byte enable for quadword disown writes. 



The format of the address field corresponds to that of an address cycle on the NDAL, which is 
described in Section 3.3.4.1. 

When a disown write is done, the ADDRESS field is first loaded. CMD<3:0> is loaded with the 
WDISOWN command. Four quadwords of write data are loaded if the transaction is hexaword 
length; if the transaction is quadword length, one quadword of data is loaded. 

All writeback data is read from the data RAMs before the NDAL transaction is started, to simplify 
error handling. If a quadword of data is read out with an uncorrectable error, the command field 
sent with that data cycle is changed from WDATA to BAD WD ATA 

The WRITEBACK_QUEUE always takes priority over the NON_WRITEBACK_QUEUE in 
driving the NDAL. 

The WRITEBACK_QUEUE backpressures the Cbox control when it gets full, causing the 
following: 

1. All reads from the Mbox are prevented. 

2. All writes from the Mbox are prevented. 

3. All fills are prevented. 

4. All cache coherency lookups are prevented. 
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13.4.3.4 Timeout counters 

The BIU has two timeout counters, one for each read request which may be outstanding. If all 
the fills for an outstanding read have not completed when the associated timeout counter expires, 
the BIU notifies the FILL_CAM of the error and it is handled as described in Chapter 3. 

The NVAX timeout counters are shown in Figure 13—21. The Ebox contains the Ebox base counter 
and the Ebox counter, which counts Ebox stall cycles. The Cbox contains two read counters which, 
in normal mode, are driven from the Ebox base counter. The Ebox counters are described in detail 
in Chapter 8. 

Three IPR bits control the operation of the timeout counters. When ECR<TIMEOUT_EXT>, 
ECR<S3_TIMEOUT_TEST>, and CCTL<TIMEOUT_TEST> are all cleared, the counters are 
in normal mode. When ECR<TIMEOUT_EXT> is set, an external timebase may be used to 
lengthen the timeout period; when CCTL<TIMEOUT_TEST> is set, the read timeout counters 
are placed in test mode, under which the read timeout values are shortened; and when 
ECR<S3_TIMEOUT_TEST> is set, the Ebox counter is put in test mode, under which the S3 
timeout value is shortened. 
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Figure 1 3-21 : NVAX Timeout Counters 
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In normal mode, the Cbox and the Ebox share the base counter, which is run from the internal 
NVAX clock. The 12-bit Ebox counter and the 8-bit Cbox read counters are clocked with the 
global signal, E%TEVIEOUT_ENABLE_H, which is generated from the 16-bit base counter. In 
normal mode, E%TIMEOUT_ENABLE_H is asserted for one NVAX internal cycle when the Ebox 
base counter overflows; if an external timebase is used (if ECR<TIMEOUT_EXT> is asserted), 
E%TIMEOUT_ENABLE_H is asserted for one cycle of the external timebase when the counter 
overflows. E%TIMEOUT_BASE_H is always asserted when the timeout counter is in normal mode; if 
ECR<TIMEOUT_EXT> is asserted, E%TIMEOUT_BASE_H is asserted for one NVAX internal cycle 
when the input clock transitions high. 

The timeout values for normal mode are shown in Table 13—26. 
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Table 13-26: 


NVAX Timeout Values in Normal Mode 








Cycle time 


Timeout 
Granularity 


Read timeout 1 


Ebox timeout 1 






10-ns 
NVAX 


655 microseconds 


167.117 (minimum) to 167.772 
(max) milliseconds 


2.6837 (minimum) 
(max) seconds 


to 


2.68345 


12-ns 
NVAX 


786 microseconds 


200.54 (minimum) to 201.327 
(max) milliseconds 


3.22044 (minimum) 
(max) seconds 


to 


3.22123 


14-ns 
NVAX 


917 microseconds 


233.964 (minimum) to 234.881 
(max) milliseconds 


3.75718 (minimum) 
(max) seconds 


to 


3.7581 



lr The timeout logic is in normal mode when ECR<TIMEOUTJEXT> , CCTL<TIMEOUT_TEST>, and 
ECR<S3_TIMEOUT_TEST> are all cleared. 



Each Cbox read counter is initialized to zero when it is not enabled with either 
C_BIU_NOC_5%BX1L.TIMO_0_EN_H or C_BIU_NOC_5%BXI_TIMO_l_EN_H, and counts as long as the read 
is outstanding. If all the fills do not return within the timeout period, the counter overflows and 
C_BIU_NOC%BXI_TIMO_0_LAT_H or C_BIU_NOC%BXI_TIMO_l_LAT_H is asserted. As a result, the read 
is aborted, the timeout counter is reset to zero, and the error is handled as described in Chapter 3. 

If a system designer needs to lengthen the timeout values, an external timebase, K%EXT_TMBSJB, 
can be selected by setting ECR<TIMEOUT_EXT> in the Ebox control register. In this case, 
the Ebox base counter is clocked with the external timebase, which enters the chip through 
P%OSC_TCl_H. 

The counters are configurable for use at chip test and at power-up test. At chip test and/or 
during power-up diagnostics, the read counters can be tested in the following way: Set 
CCTL<TIMEOUT_TEST> so that the Cbox counters run off the internal NVAX clock. Clear 
ECR<S3_TIMEOUT_TEST>. Do a read of a memory or I/O space location which will not respond 
within the timeout period. A read timeout should occur. This must be done for each timeout 
counter. 

The timeout values for the Cbox and Ebox counters in test mode are shown in Table 13-27. 



Table 13-27: 


NVAX Timeout Values In Test Mode 






Timeout 






Cycle time 


Granularity 


Read timeout 1 


Ebox timeout 2 


10-ns 


10 nanoseconds 


2.55 (minimum) to 2.56 (max) 


40.95 (minimum) to 40.96 (max) 


NVAX 




microseconds 


microseconds 


12-ns 


12 nanoseconds 


3.06 (minimum) to 3.072 


49.14 (minimum) to 49.152 (max) 


NVAX 




(max) microseconds 


microseconds 


14-ns 


14 nanoseconds 


3.57 (minimum) to 3.584 


57.33 (minimum) to 57.344 (max) 


NVAX 




(max) microseconds 


microseconds 



1 Read timeout test is done under these conditions: ECR<TIMEOUT_EXT> and ECR<S3_HMEOUT_TEST> cleared; 
CCTL<TIMEOUT_TEST> set 

2 Ebox timeout test is done under these conditions: ECR<TTMEOUT_EXT> and CCTL<TIMEOUT_TEST> cleared; 
ECR<S3_TIMEOUT_TEST> set 
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Forcing timeouts cannot be done by reading nonexistant memory or I/O: NDAL designers respond 
to nonexistant memory and I/O space with either NACK or RDE, which happen well before the 
timeout counters expire. A timeout can be accomplished in the following way: 

1. Do a write or a read-modify- write which causes an OREAD to bring owned data into the 
backup cache. 

2. Do an IPR WRITE to clear the owned bit in the backup cache tag store. 

3. Perform another operation which requires ownership in the Bcache. This OREAD will timeout 
because it won't hit in the backup cache and memory won't respond because it believes the 
backup cache owns it. 

4. Do an IPR WRITE to the Bcache tag store to put it back into the owned state. 

The list which follows describes a scenario in which read data takes a long time to return to the 
Ebox. This case should not approach the Ebox timeout value; it is given to illustrate what can 
keep data from returning quickly to the Ebox. 

1. The Cbox write queue is full. 

2. A Dread, call it Dread A, enters the Cbox and has a conflict with the last write queue entry, 
Write A, which means that the whole write queue must be cleared out before Dread A can 
proceed. 

3. The writes in the write queue all miss in the Bcache, and each one requires a writeback from 
another CPU which owns the block. As each writeback is done, the data is returned to the 
Bcache, ownership is passed to the Bcache, and the write queue is emptied of one write. In 
this scenario, eight writebacks are required before Read A can be processed. 

4. After the Oread for Write A reaches the NDAL, an invalidate arrives for A After the data 
is returned and Write A is processed, the block will be written back, due to the previous 
invalidate. 

5. Now Dread A will miss in the Bcache, and it will have to wait for another writeback. 
Eventually this read data will return, and the Ebox gets its data. 

DERIVATION OF TIMEOUT VALUES 

The timeout values given on the previous pages were derived from NVAX cycles as 
follows: 



Table 13-28: Derivation of NVAX Timeout Values 



NVAX 


Timeout 








mode 


Granularity 


Read timeout 




Ebox timeout 




(in NVAX cycles) 


(in NVAX cycle 


*) 


(in NVAX cycles) 


Normal 


2**16 


2**24—2**16 


(minimum) 


2**28—2**16 






to 2**24 (max) 




(minimum) to 2**28 










(max) 


Test 


1 


2**8—1 


(minimum) 


2**12—1 (minimum) to 






to 2**8 (max) 




2**12 (max) 
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13.4.3.5 BIU clocking: Relating internal cycles to external cycles 

Three NVAX internal cycles take place in the time of one NDAL cycle. The BIU relates internal 
cycles to external cycles by naming the internal cycles according to where they fall relative to the 
external cycle. This is shown in Figure 13—22. 

Figure 13-22: BIU cycle counts 



BIU CYCLE COUNT: 



C BIUY.CYCLE 1 H 



-» < C " BI 



U%CYCLE 2 



%CYCLE_3_H ^ 



NVAX INTERNAL CYCLES: 



NDAL EXTERNAL CYCLE: 



PHI1 . PHI2 . PHI3 


PHI4 


PHI1 . PHI2 


PH 13 . PHI4 


PHI1 


PHI2 . PHI3 . PHI4 


NDAL PHI1 




NDAL PHI2 


NDAL PH 


13 


NDAL PHI4 









The BIU has a shift register which asserts only one of the signals CJBIU%CYCLE_1_H, 
C_BIU%CYCLE_2_H, and C_BIU%CYCLE_3_H during any given NVAX cycle. This shift register is 
initialized properly by K_CE%RESET_H, which comes from the clock section of the chip. During 
reset, the clock section asserts K.CE%RESET_H during every NDAL phase 4, allowing the BIU to 
initialize the shift register properly. 

Only the NVAX internal clocks are used in the Cbox and BIU, while only the external clocks 
are used in the pad ring. Through the use of C_BIU%CYCLE_l_H, C_BIU%C YCLE_2_H , and 
C_BIU%CYCLE_3_H, the BIU is able to properly drive and receive the NDAL to and from the 
pad ring. 

There is a delay in the NDAL clocks as they travel from NVAX to the other NDAL chips and also 
back to NVAX. The delay from the NVAX output pin, P%PHI 12_OUT_H, to the NVAX input pin, 
P%PHI12_IN_H, may be as little as Ons or as much as three internal NVAX phases (one NDAL 
phase). This delay is shown graphically in Figure 13-23. 
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— N VAX CYCLE 1 — 
1 I 2 I 3 I 4 



-NVAX CYCLE 2 — 
1 I 2 I 3 I 4 



-NVAX CYCLE 3 

1 I 2 I 3 I 4 



K_MCB%PHI_1_H 
K_MCB%PHI_2_H 
K_MCB%PHI_3_H 
K MCB%PHI 4 B 



P%PHI_12_OUT_H 
P%PHI_23_OUT_H 
P%PHI_34_O0T_H 
P%PHI 41 OUT H 



P%PHI_12_IN_H 

(earliest possible) 
P%PHI_23_IN_H 

(earliest possible) 
P%PHI_34_IN_H 

(earliest possible) 
P%PHI_41_IN_H 

(earliest possible) 



P%PHI_12_IN_H 

(latest possible) 
P%PHI_23_IN_H 

(latest possible) 
P%PHI_34_IN_H 

(latest possible) 
P%PHI_41_IN_H 

(latest possible) 



/ \_ 



J V 



J ^ 



NDAL PHASE 1 



NDAL PHASE 1 



J s_ 



j V 



J V. 



ONE NDAL CYCLE 

NDAL PHASE 2 : NDAL PHASE 3 



•EARLIEST POSSIBLE NDAL CYCLE 

— NDAL PHASE 2 : NDAL PHASE 3 



-: NDAL PHASE 4 



>AL PHASE 4 



LATEST POSSIBLE NDAL CYCLE 

NDAL PHASE 1 : NDAL PHASE 2 : NDAL PHASE 3 



P%NDAL_H 

(earliest possible) 



P%NDAL_H 

(latest possible) 



K MCB%PHI 4 H 



XXXX>00000000000<XX^^ 

I H« — - — NVAX PHI 23 LATCH OPEN TO RECEIVE NDAL— 



-5h 



xooooooooooooooooooooooooooooooooooooooooooooooodzizziiizzz: 

I K NVAX PHI23 LATCH OPEN TO RECEIVE NDAL- 



321 



NVAX PHI 4, CYCLE 3 LATCH OPEN TO BRING NDAL INTO INTERNAL NVAX TIME" 



KJMCB%PHI_1_H, K_MCB%PHI_2_H, KJMCB%PBX3_H, and KJHCB%PHI_4_H are the internal NVAX 
clocks which are used in the Cbox. Figure 13-23 shows that the NDAL clocks at the input pins 
(P%PHI12_IN_H, P%PHI23_IN_H, P%PHI34_IN_H, and P%PHI41_IN_H) may be delayed by 
up to three internal NVAX phases. The NDAL always operates with respect to the clocks as 
received at each NDAL driver/receiver, so if the NDAL clocks are delayed, the entire operation 
of the NDAL is delayed. 
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The CBOX BIU is designed so that even if the NDAL is operating with three full phases of delay 
from the internal NVAX clocks, the BIU is able to drive and receive the NDAL properly. For 
example, P%NDAL_H<63:0> are valid at the beginning of NDAL phase 3. NVAX receives this 
bus using an NDAL latch which is open while P%PHI23_IN_H is asserted. The output of this 
latch is sent from the NVAX pad ring to a latch in the NVAX BIU which is open during NVAX 
phase 4 of BIU cycle 3. This timing allows 2 NVAX phases of delay to get the signal from the 
pad ring to the BIU. Thus, the NDAL is properly received for the entire range of possible NDAL 
delay. Once the NDAL is latched by the phase 4, cycle 3 latch, the BIU operates entirely using 
the internal NVAX clocks; the NDAL clocks are only used in the pad ring itself. 
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13.4.4 The FILL__CAM 

The FILL_CAM has two entries, each of which is used for an outstanding read to memory or for 
a DREAD_LOCK in progress. Its depth limits the number of outstanding reads to memory at a 
time. The fields in each FILL_CAM entry are described in Table 13-29. 



Table 1 3-29: FILL.CAM Fields 


Field 


Purpose 


ADDRESS<31:3> 


Q uad word-aligned address of the read request. 


RDLK 


Indicates that a READ_LOCK is in progress. 


IREAD 


This is an I stream read from the Mbox which may be aborted. 


OREAD 


This is an outstanding OREAD; block ownership bit should be set when the 




fill returns. 


WRITE 


This read was done for a write; write is waiting to be merged with the fill. 


TO_MBOX 


Data is to be returned to the Mbox. 


RIP 


READ invalidate pending. 


od? 


OREAD invalidate pending. 


DNF 


Do not fill - data is not to be written into the cache or validated when the fill 




returns. 


RDLK_FL_DONE 


Indicates that the last fill for a READJLOCK arrived. 


REQ_FILL_DONE 


Indicates that the requested quadword of data was received from the NDAL. 


COUNT<1:0> 


Counts the number of fill quadwords that have been successfully returned. 


VALID 


Indicates that the entry contains valid information. 



The FILL_CAM backpressures the Cbox control so that if it is full, any read or write request 
stalls until an entry is free. 

When the read miss first occurs and the FILL_CAM entry is loaded, the following bits are cleared: 
RIP, OIP, RDLK.FLJDONE, and REQ_FILL_DONE. VALID is set and the ADDRESS field is 
loaded. IREAD, RDLK, OREAD, WRITE, and TO_MBOX are loaded with the correct information. 
If the cache is off, in ETM, or the miss is for an I/O reference, DNF is set; otherwise it is cleared. 
COUNT is set to 0 if four fill quadwords are expected; it is set to 3 if only one quadword is 
expected. 

As each fill returns successfully, COUNT is incremented so that when the final fill returns and 
COUNT=3, the Cbox updates the tag store appropriately. 

If an abort request arrives from the Mbox, and the entry is marked IREAD, the TO_MBOX bit is 
cleared. When the data returns, it will be written into the backup cache (if DNF is not set) but 
it will not be sent to the Mbox. 

If a coherence request arrives from the NDAL which matches the address of a FILL_CAM entry, 
RIP or OIP may be set. Table 13-30 shows when each is set. 
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Table 1 3-30: Cbox Response to Coherence Transactions to FILL_CAM Entries 



Coherence 



State of OREAD bit 



Transaction 



Cbox Action 



OREAD set or clear 



OREAD, READLK, any 



Set OIP. Send invalidate immediately to the 
Pcache. 



write 



OREAD set 



DREAD, IREAD 
DREAD, IREAD 



Set RIP. 



OREAD clear 



Take no action. 



When all the fills for an outstanding miss have completed, a cache coherence transaction is 
initiated if either RIP or OIP is set and DNF is not set. This is done immediately after the fill 
and the validate of the cache are done, and cannot be interrupted by any other transaction. 

When a WRITE_UNLOCK completes successfully and RIP or OIP is set, the cache coherence 
transaction is initiated immediately. 

There are several error cases where RIP or OIP may be set, indicating the need for a cache 
coherence transaction, but the Cbox will not perform the transaction, possibly causing the system 
element to time out. These cases are as follows: 

1. The fill sequence fails by ending in RDE or timeout. If the fill was meant for the Pcache and 
ends in an error, the Pcache invalidates itself. 

2. A READJLOCK sequence does not conclude with a WRITE.UNLOCK but with a 
write-one-to-clear to the RDLK bit in CEFSTS. 

As shown in the table above, when an ownership- type coherence transaction arrives, an invalidate 
is sent immediately to the Pcache and OIP is set. When the cache coherence transaction to the tag 
store is processed immediately after all the fills have arrived, a second invalidate will be issued 
to the Pcache, although it is not strictly necessary. The first invalidate is sent immediately so 
that the block in the Pcache is invalidated as soon as possible, to prevent the stale data from 
being accessed before the rest of the fills return. 

13.4.4.1 Block-conflict in the FILLCAM 

Every new read or write from the Mbox is checked against valid FILL_CAM entries so that any 
transaction with a cache block conflict is stalled until all the fills return for the outstanding 
read, clearing the conflicting FILL_CAM entry. In this way, cache accesses to a block with an 
outstanding fill are prevented. 

When the cache is off or in ETM, writes are not checked for block conflict but are sent immediately 
to memory. 

13.4.4.2 The FILL_CAM and DREAD_LOCKs 

Each DREAD_LOCK from the Mbox is held in the FILL.CAM until the associated 
WRITE_UNLOCK completes, regardless of whether the read hits or misses in the backup 
cache. Only one DREAD_LOCK/WRITE_UNLOCK transaction is in progress at a time. A 
DREAD_LOCK which does not produce an owned hit in the backup cache results in an OREAD 
on the NDAL to gain ownership of the block so that the write can be done. 
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By holding the DREADJLOCK address in the FILL_CAM from the time the DREAD_LOCK starts 
until the WRITE_UNLOCK completes, the Cbox prevents the block from being written back to 
memory during that time. This guarantees that the DREAD_LOCK/WRITE_UNLOCK sequence 
will not be interrupted by another CPU requesting ownership of the block. The CPU depends on 
no other state in memory once the OREAD is done in order to complete the WRITE_UNLOCK, 
so no deadlock can arise. 

Every new transaction is checked against the FILL_CAM to ensure that the block is not 
inaccessable due to an outstanding fill or DREADJLOCK 

If either RDLK bit is set in the FILL_CAM, IREADs and DREADs are not processed. Incoming 
fills and coherency transactions continue normally; and the WRITE_QUEUE is serviced normally. 
The only transaction which should appear in the WRITE_QUEUE (when either RDLK bit is set) 
is the WRITE.UNLOCK corresponding to the READ_LOCK 

The one exception to this is when the READJLOCK terminates in an error. In this case 
an IPR_WRITE to CEFSTS is the next transaction which appears in the WRITE_QUEUE. 
Specifically, a write-one-to-clear of the RDLK bit in CEFSTS has the side effect of clearing any 
RDLK bit in the FILL.CAM which is set. If one of the RDLK bits is cleared in the FILL_CAM, 
hardware also clears the corresponding valid bit, freeing the entry for a new transaction. 

I When the RDLK bit is cleared by a normal WRITE_UNLO CK, a cache coherency transaction is 

initiated if RIP or OIP was set on the entry. RIP and OIP are ignored when the RDLK bit is 
cleared by the "IPR write unlock" method. 
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13.5 Cbox Internal Processor Registers 

The processor registers that are implemented by the NVAX Cbox are logically divided into three 
groups, as follows: 

• Normal — Those IPRs that address individual registers in the NVAX CPU chip or system 
environment. 

• Bcache tag IPRs — The read-write block of IPRs that allow direct access to the Bcache tags. 

• Bcache deallocate IPRs — The write-only block of IPRs by which a Bcache block may be 
deallocated. 

Each group of IPRs is distinguished by a particular pattern of bits in the IPR address, as shown 
in Figure 13-24. 

Figure 13-24: IPR Address Space Decoding as seen by Software 



Normal IPR Address 



31 30 29 28|27 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — 4 — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — 4 — + — + — + — + — + — + — + — + — + — 4 
I SBZ | 0| SBZ | IPR Number | 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — 4 

Bcache Tag IPR Address 

31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — 4 — + — + — + — + — + — + — + — + — + — + — + — + — + — + — 4 
I SBZ | 1| 0| 0| x| Bcache Tag Index I SBZ | 

+ — + — 4 — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + + + — + — 4 — 4 — 4 — 4 — 4 — 4 — 4 — 4 — 4 

Bcache Deallocate IPR Address 

31 30 29 28127 26 25 24|23 22 21 20 1 19 18 17 16 1 15 14 13 12 1 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I SBZ | 1| 0| 1| x| Bcache Tag Deallocate Index | SBZ | 

4 4 — 4 4 — 4 4 4 — 4 4 4 4 — 4 4 — 4 4 4 — 4 4 — 4 4 4 4 4 4 — 4 4 4 4 4 — 4 4 4 4 
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The numeric range for each of the three groups is shown in Table 13-31. 
Table 13-31: IPR Address Space Decoding 

IPR Address 
EPR Group Mnemonic 1 Range (hex) Contents 

Normal 00000000.. OOOOOOFF 256 individual IPRs. 

BcacheTag BCTAG 01000000..011FFFE0 2 64k Bcache tag IPRs, each separated by 20(hex) 

from the previous one. 

Bcache Deallocate BCFLUSH 0 1400000. . 0 15 FFFEO 2 64k Bcache tag deallocate IPRs, each separated 

by 20(hex) from the previous one. 

1 The mnemonic is for the first IPR in the block 

2 Unused fields in the IPR addresses for these groups should be zero. Neither hardware nor microcode detects and faults on 
an address in which these bits are non-zero. Although non-contiguous address ranges are shown for these groups, the entire 
IPR address space maps into one of the these groups. If these fields are non-zero, the operation of the CPU is UNDEFINED. 



NOTE 

The address ranges shown above are those used by the programmer. When processing 
normal IPRs, the microcode shifts the IPR number left by 2 bits for use as an IPR 
command address. This positions the IPR number to bits <9:2> and modifies the 
address range as seen by the hardware to 0..3FC, with bits <1:0>=00. No shifting 
is performed for the other groups of IPR addresses. 

Because of the sparse addressing used for IPRs in groups other than the normal group, valid IPR 
addresses are not separated by one. Rather, valid IPR addresses are separated by 20(hex). For 
example, the IPR address for Bcache tag 0 is 01000000 (hex), and the IPR address for Bcache tag 
1 is 01000020 (hex). In this group, bits <4:0> of the IPR address are ignored, so IPR numbers 
01000001 through 0100001F all address Bcache tag 0. 

Processor registers in all groups except the normal group are processed entirely by the NVAX 
CPU chip and will never appear on the NDAL. This is also true for a number of the IPRs in 
the normal group. IPRs in the normal group that are not processed by the NVAX CPU chip are 
converted into I/O space references and passed to the system environment via a read or write 
command on the NDAL. 

The processor registers implemented by the NVAX Cbox are are shown in Table 13—32. 
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Table 13-32: Cbox Processor Registers 



Register Name 


Number 
Mnemonic (Dec) (Hex) 


Type 


Cbox 
Loc 1 


Cbox 
Addr 2 


Cbox Control Register 


CCTL 


160 


AO 


RW 


Abus 


280 


Reserved for Cbox 




161 


Al 








Bcache Data ECC 


BCDECC 


162 


A2 


W 


Dbus 


288 


Bcache Error Tag Status 


BCETSTS 


163 


A3 


RW 


Abus 


28C 


Bcache Error Tag Index 


BCETIDX 


164 


A4 


R 


Abus 


290 


Bcache Error Tag 


BCETAG 


165 


A5 


R 


Abus 


294 


Bcache Error Data Status 


BCEDSTS 


166 


A6 


RW 


Dbus 


298 


Bcache Error Data Index 


BCEDIDX 


167 


A7 


R 


Abus 


29C 


Bcache Error Data ECC 


BCEDECC 


168 


A8 


R 


Dbus 


2A0 


Reserved for Cbox 




169 


A9 








Reserved for Cbox 




170 


AA 








Fill Error Address 


CEFADR 


171 


AB 


R 


Abus 


2AC 


Fill Error Status 


CEFSTS 


172 


AC 


RW 


Abus 


2B0 


Reserved for Cbox 




173 


AD 








NDAL Error Status 


NESTS 


174 


AE 


RW 


BIU 


2B8 


Reserved for Cbox 




175 


AF 








NDAL Error Output Address 


NEOADR 


176 


B0 


R 


BIU 


2C0 


Reserved for Cbox 




177 


Bl 








NDAL Error Output Command 


NEOCMD 


178 


B2 


R 


BIU 


2C8 


Reserved for Cbox 




179 


B3 








NDAL Error Data High 


NEDATHI 


180 


B4 


R 


BIU 


2D0 


Reserved for Cbox 




181 


B5 








NDAL Error Data Low 


NEDATLO 


182 


B6 


R 


BIU 


2D8 


Reserved for Cbox 




183 


B7 








NDAL Error Input Command 


NEICMD 


184 


B8 


R 


BIU 


2E0 


Reserved for Cbox 




185 


B9 








Reserved for Cbox 




186 


BA 








Reserved for Cbox 




187 


BB 








Reserved for Cbox 




188 


BC 








Reserved for Cbox 




189 


BD 








Reserved for Cbox 




190 


BE 








Reserved for Cbox 




191 


BF 








Bcache Tag (01000000 - 011FFFE0 HEX) 


BCTAG 






RW 


Abus 





1 Each Cbox IPR is located in the Cbox Abus datapath, the Cbox Dbus datapath, or the Cbox BIU datapath. 

2 The address given is as it is seen in the Cbox, after microcode has shifted the software address left by two bits. 
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Table 13-32 (Cont.): Cbox Processor Registers 

Number Cbox Cbox 

Register Name Mnemonic (Dec) (Hex) Type Loc 1 Addr 2 

Bcache Deallocate (01400000 - 015FFFE0 HEX) BCFLUSH W Abus 



1 Each Cbox IPR is located in the Cbox Abus datapath, the Cbox Dbus datapath, or the Cbox BRJ datapath. 

2 The address given is as it is seen in the Cbox, after microcode has shifted the software address left by two bits. 
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IPRs in the system and in the Cbox are accessed through IPR_READs and IPR_WRITEs from 
the Mbox to the Cbox. When the Cbox recognizes a valid IPR_READ on M%S6_CMD_H<4rf>>, it 
loads the read into the DREADJLATCH to be processed. The Mbox guarantees that only one 
DREAD or IPR.READ may be outstanding at a time, so that the DREADJLATCH will not be 
overwritten. A valid IPR_WRITE is loaded into the WRITE_PACKER and proceeds immediately 
to the WRITE.QUEUE. 

All IPR reads and writes to the Cbox flush the WRITE_QUEUE before completing. Any 
IPR_READ sets DWR_CONFLICT bits in all valid entries in the WRITE_QUEUE so that any 
preceding writes of any kind will complete before the IPR_READ. An IPR_WRITE is placed in 
the WRITE_QUEUE after the preceding writes so that the ordering takes place naturally. If a 
read arrives after the IPR_WRITE and before it has been processed, the WRITE_QUEUE conflict 
bits are set so that the WRITE_QUEUE takes priority over the read. 

If the IPRJEtEAD addresses one of the Cbox registers, the Cbox returns the data from the register 
through the CM_OUT_LATCH, in the usual way that it would return data for a read hit. The only 
difference is that it returns just one quadword or less of data, rather than the usual 4 quadwords. 
The Cbox asserts C%LAST_FDLLJH so the Mbox does not expect any more fills. 

If a write-only Cbox register is read, the Cbox returns UNPREDICTABLE data. Reading an 
unimplemented Cbox register returns UNPREDICTABLE data; if an unimplemented register is 
written, the write is discarded by the Cbox and normal operation continues. 

If the Cbox receives an IPR access to a legal IPR address which is not within the Cbox block of I 
IPR addresses, it converts it to an I/O space read or write. The Cbox merges the IPR address 
with E1000000 hex, effectively adding the base I/O space address of the IPR block to the IPR 
address. This is done in hardware by forcing bits <31:29> and bit <24> to l's. (The other upper 
bits are expected to be received as zero's.) 

From this point on, the transaction is treated as an I/O space transaction by the Cbox. It sends 
the request off-chip to the NDAL through the NON_WRITEBACK_QUEUE. When the fill data 
returns, the data is returned to the Mbox but is not cached by the Cbox. I/O space reads and 
writes are never cached in the primary cache or the backup cache. 




Yv 
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13.5.1 Cbox Control IPR (CCTL) 

CCTL is a read/write register which contains bits controlling the behavior of the Cbox. The bits 
are detailed in Figure 13-25 and Table 13-33. 



Figure 13-25: IPR AO (hex), CCTL 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 



I | | x| x| x| x| x| x| x| x| x| x| x| x| x| 

+ + + + + + + + + + + + + + 4 +. 



I I I 



:CCTL 



I I 

I '-SW_ETM 
'-HW ETM 



| * -ENABLE 
'-TAG SPEED 



-DATA SPEED 



'-SIZE 



'-FORCE_HIT 
' -DISABLE_ERRORS 
'-SW_ECC 
' -TIMEOUT_TEST 
'-DISABIE_PACK 
PM_ACCESS_TYPE 
V -PM_HIT_TYPE 
'-FORCE NDAL PERR 



Table 13-33: CCTL Field Descriptions 



Name 


Extent 


Type 


Description 


ENABLE 


0 


RW,0 


Turns the bcache on and off. 


TAG_SPEED 


1 


RW,0 


Controls time to access the tag RAMs. 


DATA_SPEED 


3:2 


RW,0 


Controls time to access the data RAMs. 


SIZE 


5:4 


RW,0 


Selects one of four backup cache sizes. 


FORCE.HIT 


6 


RW,0 


Forces memory reads and writes to hit in the backup cache. 


DISABLE_ERRORS 


7 


RW,0 


Disables all backup cache ECC errors. 


SW_ECC 


8 


RW,0 


Enables use of ECC check bits as given by software for the tag 
store and data RAMs. 


TIMEOUTTEST 


9 


RW,0 


Puts the Cbox read timeout counters into test mode. 


DISABLEJPACK 


10 


RW,0 


Disables the Cbox write packer. 


PM_ACCESS_TYPE 


13:11 


RW,0 


Selects type of Bcache access for the performance monitoring 
hardware. 


PM_HIT_TYPE 


15:14 


RW,0 


Selects type of Bcache hit for the performance monitoring 
hardware. 


FORCE_NDAL_PERR 


16 


RW,0 


Forces a parity error in the command field of the next outgoing 
NDAL transaction. 


SW_ETM 


30 


RW,0 


Used by software to put the backup cache into ETM. 


HW_ETM 


31 


WC 


Used by hardware to put the backup cache into ETM. 
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13.5.1.1 ENABLE 

When ENABLE = 1, the backup cache is enabled for operation. When ENABLE=0, the backup 
cache is off and all references are treated as misses and are not looked up in the backup cache. 
When the backup cache is off; FORCE.HIT, SWJETM and HW.ETM are ignored. Reset clears 
this bit so that the Bcache is off when the chip is reset. 

13.5.1.2 TAG_SPEED 

The Cbox provides this bit to select the speed of the tag rams. Table 13-34 shows the relationship 
of the value of TAG_SPEED and the access time of the tag RAMs, given in NVAX cycles. This 
is the total RAM access time including internal Cbox processing time. For information on the 
actual cache ram access times required, see Section 13.3.1. Reset clears this bit so that the tag 
access repetition rate is 3 cycles when the chip is reset. 



Table 13-34: TAG SPEED 





tag read 


tag write 




TAGJSPEED 


rep rate 


rep rate 


comments 


0 


3 cycles 


3 cycles 




1 


4 cycles 


4 cycles 


may not be used when DATA_SPEED=00 



13.5.1.3 DATA__SPEED 

The Cbox provides this bit to select the speed of the data rams. Table 13-35 shows the relationship 
of the value of DATA_SPEED and the access time of the data RAMs, given in NVAX cycles. This 
is the total RAM access time including internal Cbox processing time. For information on the 
actual cache ram access times required, see Section 13.3.1. Reset clears these bits so that the 
data read rep rate is 2 cycles when the chip is reset. 

Table 13-35: DATA_SPEED 

data read data write 



DATA_SPEED<1 K)> rep rate rep rate conunents 



00 


2 cycles 


3 cycles 


may not be used when TAG_SPEED=1 


01 


3 cycles 


4 cycles 




10 


4 cycles 


5 cycles 




11 


unused 1 


unused 1 





1 Cbox response in this mode is UNDEFINED. 



The fastest DATAJ3PEED may not be selected with the slowest TAG_SPEED, for in this 
configuration the result of the cache hit calculation is not known in time for the Cbox state 
machines to operate correctly. 
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13.5.1.4 SIZE 

Four backup cache sizes are selectable by using the SIZE bits, as shown in Table 13—36. These 
bits are cleared on reset so that when the chip is reset, the 128-kilobyte cache is selected by 
default. 

Table 13-36: SIZE 

SIZE<1:0> Backup cache size 

00 128 kilobytes 

01 256 kilobytes 

10 512 kilobytes 

11 2 megabytes 



13.5.1.5 FORCEJilT 

When FORCEJHIT is set, all memory references, both Dstream and Istream reads and writes, 
are forced to hit in the backup cache. The tag store state is not changed but data is always read 
or written. Reset clears this bit. 

The backup cache must be enabled when the cache is used in FORCE__HIT mode. 
This mode is expected to be used for testing purposes only. 

1 3.5.1 .6 DISABLEERRORS 

When DISABLE_ERRORS is set, all ECC errors from the backup cache are ignored. Neither 
C%CBOX_H_ERR_H nor C%CB03L.S_ERR._H is asserted. C%CBOX_HARD_ERR_H is not asserted for 
data returning to the Mbox. The backup cache data syndrome is loaded into BCEDECC on 
every cache access; the behavior of BCETSTS, BCETIDX, BCETAG, BCEDSTS, and BCEDIDX 
is unpredictable. This feature allows operation of the backup cache even if the error detection 
and correction logic is faulty. It also allows access to the backup cache syndrome for the purposes 
of testing the ECC logic. Reset clears this bit. 

13.5.1.7 SW_ECC 

When SW_ECC is clear, the Cbox generates correct ECC check bits for all writes to the tag store 
and data RAMs. When SW_ECC is set, the Cbox does not generate the check bits when the 
backup cache is written with data, but uses the check bit values as specified by software and 
written in the BCDECC register. Note that if a read or write reference misses in the Bcache 
when SW_ECC is set, all four fills will be written with the ECC given in BCDECC when they 
return. 

When SW_ECC is set and the tag store is written using an IPR write to BCTAG, the Cbox uses 
the check bits for the tag store as given through the IPR write. The value of SW_ECC does not 
affect tag store transactions other than IPR writes. 

Reset clears this bit. 
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13.5.1.8 TIMEOUTTEST 

When TEMEOUT_TEST is set, the Cbox uses the internal clock to clock its read timeout counter. 
When TIMEOUT_TEST is clear, the Cbox uses E%TIMEOUT_BASE_H to clock its timeout counters. 
Reset clears this bit. 

1 3.5.1 .9 DIS AB LEPACK 

When DISABLE_PACK is set, the Cbox does not pack quadword writes together. Instead, the 
write packer passes every write it receives directly into the write queue. When the bit is clear, 
the Cbox write packer operates normally. DISABLE_PACK is intended for testing purposes only. 
Reset clears this bit. 

13.5.1.10 PM_ACCESS_TYPE 

PM_ACCESS_TYPE selects the type of Bcache access for the performance monitoring hardware.. 
The function of these three bits is fully described in Section 13.11. Reset clears these bits. 

1 3.5.1 .1 1 PM_HlTTYPE 

PM_HIT_TYPE selects the type of Bcache hit for the performance monitoring hardware. The 
function of these two bits is fully described in Section 13.11. Reset clears these bits. 

13.5.1.12 FORCE_NDAL__PERR 

When a 1 is written to FORCE_NDAL_PERR, a parity error is caused in the command field 
of the next outgoing NDAL transaction. The parity error is caused by inverting the value of 
P%PARITY_H<2>. 

Setting this bit causes only one parity error. The parity error does not occur until NVAX is granted 
the NDAL for its next outgoing transaction. If software sets FORCE_NDAL_PERR and clears it 
before NVAX is granted the bus, NVAX will still force a parity error on the next transaction. In 
order to produce a second parity error on the bus, FORCE_NDAL_PERR must be cleared and set 
again by software. 

Reset clears this bit. 

13.5.1.13 SW_ETM 

This is a software-writeable bit to put the backup cache into Error Transition Mode. When the 
cache is on and software ascertains that the cache is producing errors, it can set this bit in order 
to turn off the cache while ensuring cache coherency. Software can then flush owned data through 
use of the Bcache Deallocate IPR, BCFLUSH. In this manner, the unique data can be extracted 
from the cache before it is turned off completely. Reset clears this bit. 

13.5.1.14 HWJETM 

Hardware sets this bit when an uncorrectable error is detected in the backup cache tag store or 
data rams, unless DISABLE_ERRORS is set. Hardware sets the bit to put the backup cache into 
Error Transition Mode. 

Software clears HW_ETM by writing a one to it. 



DIGITAL CONFIDENTIAL 



The Cbox 13-63 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



1 3.5.2 IPR A2 (hex), BCDECC 



Figure 13-26: Format of the BCDECC 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + + — + — + — H + — + — + — + — + — + — + — + — + + — + — + — + — + — + — + — + — + — + 

I x| x| x| x| x| x| ECCHI I x| x| x| x| x| x| x| x| x| x| x| x| ECCLO I x| x| x| x| x| x| :BCDECC 

+ — + — + — + — + — + — + K — + — + + — + — + — + — + — + — + — + — + — + — + + — + + — + — + + — + — + — -t h — + — + 



The ECCHI field corresponds to data check bits <7:4>. The ECCLO field corresponds to data 
check bits <3:0>. 

This register is written by software. It is a write only register. 

Software writes BCDECC using an IPR_WRITE. The value in the register is then used to explicitly 
write ECC into the data RAMs during any write of the data RAMs, but only if SW_ECC is set in 
the control register. If SW_ECC is not set, hardware ignores the value in BCDECC and generates 
the check bits to be written using the ECC syndrome generator. 

BCDECC is expected to be used during testing only. It allows software to explicitly write bad 
ECC into the data RAMs in order to test Cbox error detection logic. Note that BCDECC will 
be used as the source of the ECC check bits during any write of the backup cache data RAMs, 
including those done for fills. Cache transactions must be carefully controlled while this register 
is being used in order to obtain the expected results. BCDECC will probably be most useful when 
used in FORCE_HIT mode, so that no fills are generated. 

Reset does not affect this register. 
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13.5.3 Backup Cache Tag Store Error Registers (BCETSTS, BCETIDX, BCETAG) 

On some tag store errors, hardware overwrites the corrupted values so that they cannot be 
diagnosed by reading the tag store directly. For this reason there are tag store error registers 
which hold the relevant data, so that software can understand the problem. 

The tag store error registers are loaded when any tag store error occurs. Their contents are 
not changed during reset. The status bits in BCETSTS indicate what sort of error happened. 
Correctable errors are indicated by the CORR bit; the UNCORR and BAD_ADDR errors are both 
uncorrectable-type errors. 

If no error is yet logged in the registers, the registers are loaded when either a correctable or an 
uncorrectable error occurs. Once the registers are loaded with information from a correctable 
error, they are locked against further correctable errors, and are only loaded again if an 
uncorrectable error happens. At this time either UNCORR or BAD_ADDR is set. The LOCK 
bit in BCETSTS is set as well. In this way, information from the first correctable error is held in 
the registers, and is only overwritten if an uncorrectable error happens later. 

The error registers are cleared and unlocked by software. If the error registers hold data from 
a non-correctable error and yet another non-correctable error happens before the error registers 
are unlocked, the LOST_ERR bit is set. This indicates to software that it does not have sufficient 
information in the error registers to recover from all uncorrectable errors which have occurred. 



13.5.3.1 Bcache Error Tag Status (BCETSTS) 

The BCETSTS register gives the general status of an error in the tag store, indicating the 
transaction which was taking place at the time and the type of error. The register is written 
by hardware and read by software. Hardware does not clear the error bits in this register; this 
must be done by software using write-one-to-clear to the bottom 5 bits of the register. The contents 
of the register are not changed during reset. 



Figure 13-27: IPR A3 (hex), BCETSTS 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 



I x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| TS_CMD | I , I | | I :BCETSTS 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I I I I 
| | | ' -LOCK 



I | *-CORR 
I ^-UNCORR 
'-BAD_ADDR 
*-LOST ERR 



Table 13-37: BCETSTS Field Descriptions 



Name Extent Type Description 



LOCK 0 WC Lock bit. Indicates that BCETSTS (except LOSTJERR), 

BCETIDX, and BCETAG are lacked. 

CORR 1 WC Indicates that a correctable ECC error was encountered. 
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Table 13-37 (Cont.): BCETSTS Field Descriptions 



Name Extent Type Description 



UNCORR 



2 



WC 



Indicates that an uncorrectable ECC error was encountered. 

Indicates that an addressing error was detected. This is an 
uncorrectable error. 



BAD_ADDR 



3 



WC 



LOSTERR 



4 



WC 



Indicates that more than one uncorrectable error occurred which 
was not recorded in the error registers. 



TSCMD 



9:5 



R 



Indicates what tag store command was being processed at the 
time the error occurred. 



13.5.3.1.1 LOCK 

Whenever the tag store error registers are locked due to an uncorrectable error, the LOCK bit is 
set. At this time either UNCORR or BAD_ADDR is also set to indicate the type of uncorrectable 
error. When the LOCK bit is set, the BCETSTS, BCETIDX, and BCETAG registers are all locked. 
Clearing the lock bit unlocks all three registers. The LOCK bit is set by hardware and it is cleared 
by software. It is a write-one-to-clear bit. 

13.5.3.1.2 CORR 

CORR is set when the tag store ECC decoder detects a correctable error. When this occurs, the 
Bcache Tag Store Error registers are loaded and are locked against further correctable errors. 
They are not locked against an uncorrectable error which follows. BCETSTS<LOCK> is not set. 

If a correctable error is followed by an uncorrectable error, the CORR bit remains set. 

The CORR bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 

13.5.3.1.3 UNCORR 

UNCORR is set when the tag store ECC decoder detects an uncorrectable error. When this occurs, 
the Bcache Tag Store Error registers are loaded and locked. 

The UNCORR bit and the BAD_ADDR bit are exclusive: only one of them is set for a given error 
which sets the LOCK bit. If the other type of error occurs later, the related bit is not set since 
the register is already locked. In this case, LOSTJERR is set instead. 

The UNCORR bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 

13.5.3.1.4 BADADDR 

BAD_ADDR is set when the tag store ECC decoder detects an error in the address bit, indicating 
some problem with the address lines going to the tag rams. This is an uncorrectable error, thus, 
when it occurs, the Bcache Tag Store Error registers are loaded and locked. 

The UNCORR bit and the BAD_ADDR bit are exclusive: only one of them is set for a given error 
which sets the LOCK bit. If the other type of error occurs later, the related bit is not set since 
the register is already locked. In this case, LOST_ERR is set instead. 
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The BAD_ADDR bit is set by hardware and it is cleared by software. It is a write-one-to-clear 
bit. 

13.5.3.1.5 LOST_ERR 

LOSTJERR indicates that after the first uncorrectable error was recorded in the tag store error 
registers, an additional uncorrectable error occurred for which state was not saved. LOSTJERR 
is set by hardware and is cleared by software. It is a write-one-to-clear bit. 

13.5.3.1.6 TS_CMD 

The TS_CMD field indicates what the tag store was doing when the error was detected. Its values 
are listed in Table 13-38. 



Table 1 3-38: Interpretation of TS_CMD 



TS_CMD 


NAME 


Tag Store Operation 


00111 


DREAD 


Data-stream (DREAD) or DREAD_MODIFY tag lookup 


00011 


IREAD 


Instruction-stream tag lookup 


00010 


OREAD 


Ownership-read tag lookup for a write or a READ_LOCK 


01000 


WUNLOCK 


Ownership-read tag lookup for a WRITE_UNLOCK Qookup done 
only in ETM) 


01101 


R_INVAL 


Cache coherency tag lookup as the result of NDAL DREAD or 
IREAD 


01001 


OJNVAL 


Cache coherency tag lookup as the result of NDAL OREAD or 
WRITE 


01010 


IPRJDEALLOC 


Tag lookup for an explicit IPR deallocate operation 



There are three tag store operations which do not cause any sort of errors: tag store update after 
a fill, ipr write of the tag store, ipr read of the tag store. Thus, these commands will not appear 
in BCETSTS. 

1 3.5.3.2 Bcache Error Tag Index (BCETIDX) 

This register is loaded and locked when a tag store error occurs. If a correctable error is followed 
by a second error which is not correctable, the register is loaded with information from the second, 
more serious error. Except for this case, once it is locked, it is not changed until software explicitly 
unlocks the register. This register is written by hardware and read by software. Its contents are 
not changed during reset. 
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31 30 29 28|27 26 25 24123 22 21 20|19 18 17 16 1 15 14 13 12 111 10 09 08 1 07 06 05 04|03 02 01 00 
v — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| Backup Cache Tag Store Address | 0| 0| 0| 0| 0| : BCETIDX 

h — + — + — + — + — + — + — + + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



BCETIDX contains the complete hexaword address corresponding to a tag store request which 
resulted in an error. Since the full address is saved, both the cache index and the cache tag of 
the request are known. Thus, this register shows what index was being accessed when the error 
occurred as well as showing what the tag of the request was. Software can compare this tag with 
the actual tag read from the RAMs, which is saved in BCETAG. 

On a BCFLUSH which incurs an error, the address used to flush the cache is captured in 
BCETIDX, not the memory address of the block. 



13.5.3.3 Bcache Error Tag (BCETAG) 

This register is loaded when a tag store error occurs. It is locked when an uncorrectable error 
occurs on a tag store access. Once the register is locked, it is not overwritten until it is unlocked 
by software. BCETAlG is written by hardware and read by software. It is a read-only register 
from the software point of view. The contents of BCETAG are not changed during reset. 

The register holds the data which was read from the tag store and produced the error, as shown 
in Figure 13-29. 



Figure 13-29: IPR A5 (hex), BCETAG 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| TAG I \ ECC | | | 0| 0| 0| 0| 0| 0| 0| 0| 0| : BCETAG 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I I I 

*-TAG or 0, based | * -VALID 

on cache size *-OWNED 



Table 13-39: 


BCETAG Field Descriptions 


Name 


Extent 


Type 


Description 


VALID 


9 


RO 


Valid bit 


OWNED 


10 


RO 


Ownership bit 


ECC 


16:11 


RO 


ECC check bits 


TAG 


31:17 


RO 


Backup cache tag 
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13.5.3.3.1 VALID 

VALID is the bit read from the tag RAMs which indicates whether the block is valid in the Bcache. 

13.5.3.3.2 OWNED 

OWNED is the bit read from the tag RAMs which indicates whether the Bcache owns the block 
in question. 

13.5.3.3.3 ECC 

The ECC field contains the check bits as read from the tag RAMs during the tag access which 
produced the error. 

13.5.3.3.4 TAG 

The TAG field of BCETAG is the cache tag as read from the tag RAMs. It must be interpreted 
based on the cache size being used, as shown in Table 13—40. When certain address bits are not 
used as tag bits for the cache size given, their value in BCETAG is 0. 



Table 13-40: 


TAG Interpretation 




Cache size 


Tag bits used 


Unused tag bits 


128 kilobytes 


TAG<31:17> 


None 


256 kilobytes 


TAG<31:18> 


TAG<17> 


512 kilobytes 


TAG<31:19> 


TAG<18:17> 


2 megabytes 


TAG<31:21> 


TAG<20:17> 
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13.5.4 Backup Cache Data RAM Error Registers (BCEDSTS, BCEDIDX, 
BCEDECC) 

The data RAM error registers hold data relevant to errors in the backup cache data RAMs, so 
that software can understand the problem. 

BCEDSTS holds the general status of the problem. BCEDIDX holds the data RAM index being 
used when the problem occurred. BCEDECC holds the syndrome bits as calculated on the data 
which was read from the RAMs when the problem occurred. 

If no error is yet logged in the data RAM error registers, the registers are loaded when either 
a correctable or an uncorrectable error occurs. Once the registers are loaded with information 
from a correctable error, they are locked against further correctable errors, and are only loaded 
again if an uncorrectable error happens. If an uncorrectable error happens, the LOCK bit in 
BCEDSTS is set and the registers are not overwritten until software clears the error bits. In this 
way, information from the first correctable error is held in the registers, and is only overwritten 
if an uncorrectable error happens later. 

If the registers are locked, any subsequent non-correctable error causes the LOST.ERR bit to be 
set, but does not modify any other information in the registers. LOSTJERR indicates to software 
that it does not have sufficient information in the error registers to recover from all uncorrectable 
errors which have occurred. 

Of the backup cache data RAM error registers, only BCEDSTS is writable by software. Software 
clears the error and lock bits which reenables all the Data RAM error registers to record the next 
error which occurs. 

The contents of BCEDSTS, BCEDIDX, and BCEDECC are not affected by reset. 



13.5.4.1 Bcache Error Data Status (BCEDSTS) 



Figure 13-30: IPR A6 (hex), BCEDSTS 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| DR_CMD I 0| 0| 0| | | | | | :BCEDSTS 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — ^ 

I I I I 

I | | v -LOCK 
| | *-CORR 
| v -UNCORR 
X -BAD_ADDR 
*-LOST ERR 



Table 13-41: BCEDSTS Field Descriptions 



Name Extent Type Description 



LOCK 0 WC Lock bit. Indicates that the BCEDSTS, BCEDIDX, and BCEDECC 

registers are locked. 

CORR 1 WC Indicates that a correctable ECC error was encountered. 
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Table 13-41 (Cont.): BCEDSTS Held Descriptions 



Name Extent Type Description 



UNCORR 



2 



WC 



Indicates that an uncorrectable ECC error was encountered. 



BAD.ADDR 



3 



WC 



Indicates that an addressing error was detected. 

Indicates that a second uncorrectable error occurred; it was not 
recorded in the error registers. 

Indicates what command was being processed by the data RAMs at 
the time the error occurred. 



LOST_ERR 



4 



WC 



DR.CMD 



11:8 



R 



The LOCK bit is set when an error which was not correctable has occurred. If the CORR bit is set, 
the data ram error registers are locked unless an uncorrectable error occurs. On an uncorrectable 
error, the LOCK bit is set and the registers are permanently locked until unlocked by software. 



13.5.4.1.1 LOCK 

Whenever the data RAM error registers are loaded with an uncorrectable error, the LOCK bit is 
set. At this time either UNCORR or BAD_ADDR is also set to indicate the type of uncorrectable 
error. (A correctable error does not set BCEDSTS<LOCK>.) When the LOCK bit is set, the I 
BCEDSTS, BCEDIDX, and BCEDECC registers are all locked. Clearing the lock bit unlocks 
all three registers. The LOCK bit is set by hardware and it is cleared by software. It is a 
write-one-to-clear bit. 

13.5.4.1.2 CORR 

CORR is set when the data ECC decoder detects a correctable error. When this occurs, the Bcache 
Data Error registers are loaded and locked against further correctable errors; BCEDSTS<LOCK> I 
is not set. The CORR bit is set by hardware and it is cleared by software. It is a write-one-to-clear I 
bit. 

13.5.4.1.3 UNCORR 

UNCORR is set when the data ECC decoder detects an uncorrectable error. When this occurs, 
the Bcache Data Error registers are loaded and locked. The UNCORR bit is set by hardware and 
it is cleared by software. It is a write-one-to-clear bit. 

13.5.4.1.4 BADADDR 

BAD_ADDR is set when the data ECC decoder detects an error in the address bit, indicating 
some problem with the address lines going to the data rams. This is an uncorrectable error, thus, 
when it occurs, the Bcache Data Error registers are loaded and locked. The BAD_ADDR bit is 
set by hardware and it is cleared by software. It is a write-one-to-clear bit. 



The contents of BCEDSTS are not affected by reset. 
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13.5.4.1.5 LOST_ERR 

LOST_ERR indicates that after the first uncorrectable error was recorded in the data error 
registers, an additional uncorrectable error occurred for which state was not saved. LOST_ERR 
is set by hardware and is cleared by software. It is a write-one-to-clear bit. 

13.5.4.1.6 DR_CMD 

The DR_CMD field indicates what the data RAMs were doing when the error was detected. Its 
values are listed in Table 13—42. 



Table 13-42: Interpretation of DR_CMD 


DR_CMD<11:8> Name 


Data RAM operation 


0111 DREAD 


Data lookup for a Dstream read 


0011 IREAD 


Data lookup for an I stream read 


0100 WBACK 


Data lookup for a writeback 


0010 RMW 


Data lookup for a read-modify-write (done for normal writes and 




WRITE.UNLOCKs.) 



There are two data RAM operations which do not cause any sort of errors: full quadword writes 
and fills. Thus, these commands will not appear in BCEDSTS. 

DR_CMD is only written by hardware. It is read-only for software. 

13.5.4.2 Bcache Error Data Index (BCEDIDX) 

This register holds the index of a data RAM transaction; it is loaded when an error is detected 
on a data RAM access. The index loaded due to a correctable error is not overwritten unless an 
uncorrectable error occurs afterwards. If an uncorrectable error occurs, BCEDIDX is loaded and 
locked. BCEDIDS is unlocked by software; the lock bit is in the BCEDSTS register. 

BCEDIDX is read-only from software's point of view. Its contents are not affected by reset. 
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IPR A7 (hex), BCEDIDX 



31 30 29 28127 26 25 24 | 23 22 21 20|19 18 17 


16115 14 13 12111 10 09 08|07 06 05 04|03 02 


01 00 


I 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| | 


Backup cache data RAM index | 0 | 


0| 0| : BCEDIDX 


1 

index 


or undefined, based on cache size 




BCEDIDX must be interpreted based on the cache size being used, as shown in Table 13—43. 
When certain address bits are not used as index bits for the cache size given, their value in 
BCEDIDX is undefined. 


Table 13-43: BCEDIDX interpretation 






Cache size Index bits used 


undefined index bits 




128 kilobytes BCEDIDX<16:3> 


BCEDIDX<20:17> 




256 kilobytes BCEDIDX<17:3> 


BCEDIDX<20:18> 




512 kilobytes BCEDIDX<18:3> 


BCEDIDX<20:19> 




2 megabytes BCEDIDX<20:3> 


None 





13.5.4.3 Bcache Error Data ECC (BCEDECC) 

This register holds the syndrome as calculated on the backup cache data and check bits. It is 
loaded when an error occurs on a data RAM access. Then it follows the same lock rules that 
the other Bcache Data Error registers follow. It is unlocked by software. The lock bit is in the 
BCEDSTS register. The contents of BCEDECC are not affected by reset. I 

When DISABLE_ERRORS is set, BCEDECC is loaded on every quadword read from the cache. 
This provides a way of testing the ECC logic by reading the results of the syndrome calculation. 
Note that because 4 quadwords are read from the Bcache at a time, BCEDECC will contain the 
syndrome from the LAST quadword read after the 4-qw transaction is complete. Software can 
control which quadword is read last by varying the requested quadword of a transaction; the 
Bcache controller always returns the requested quadword first, then returns the remaining 3 
quadwords in wraparound order. For example, if the programmer wants to see the contents of 
BCEDECC after quadword 2, she would do a read to quadword 3 of the block, and the quadwords 
would be read out in the order 3-0-1-2. 

Software can use BCDECC to write known check bits to the data RAMs; when the RAMs are 
read, the syndrome is captured by BCEDECC. Once the syndrome is known, the check bits which 
were calculated by the ECC hardware can be deduced, because the check bits read from the RAMs 
were known. The syndrome is simply the XOR of the calculated check bits and the check bits 
which were read from the RAMs. 

If the programmer wants to learn what the correct checkbits for a particular data pattern should 
be, she can write data to the cache while BCDECC contains all zero's and CCTL<SW_ECC> is 
set. This forces checkbits of zero to be written to the cache with the data. When the data is read 
back, BCEDECC will contain the correct checkbits for the data (the XOR of the checkbits read 
and the checkbits calculated by hardware). 
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BCEDECC is read-only from software's point of view. 
Figure 13-32: IPR A8 (hex), BCEDECC 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I x| x| x| x| x| xl ECCHI | x| x| x| x| x| x| x| x| x| x| x| x| ECCLO I x| x| x| x| x| x| :BCEDECC 

+ + + + + + — + + + + + + + + + — + + H + + + + + + + + + + + — + + + + 

The ECCHI field corresponds to syndrome bits <7:4>. The ECCLO field corresponds to syndrome 
bits <3:0>. 
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13.5.5 Fill Error Registers (CEFADR, CEFSTS) 

Some errors are related to outstanding reads to memory. These errors may be diagnosed using 
the CEFSTS and CEFADR registers. CEFSTS holds general information about the type of read 
outstanding; CEFADR holds the address of the outstanding read. The contents of these these 
registers are not changed during reset. 

13.5.5.1 Cbox Error Fill Status (CEFSTS) 

The CEFSTS register holds information related to a problem on a read which was sent to memory. 
If a read request to memory times out or is terminated with RDE, the CEFSTS register and the 
CEFADR register are loaded and locked. 

The register is read-write. Only the lowest five bits and the UNEXPECTED_FILL bit may be 
written, and then only to clear them after an error. CEFSTS is not affected by reset. 

Figure 15-33: IPR AC (hex), CEFSTS 



31 30 29 28127 26 25 24|23 22 21 20 1 19 18 17 16|15 14 13 12 1 11 10 09 08|07 06 05 04|03 02 01 00 



x| x| x| x| x| x| x| x| x| x| | x| x| x| x|COUNT| 



I I I I I I 



| I I I I | | :CEFSTS 



I I I I 
| | | '-RDLK 
| | '-LOCK 
| ' -TIMEOUT 
'-RDE 
-LOST ERR 



'-UNEXPECTED FILL 



'-ID0 
'-IREAD 
OREAD 
'-WRITE 
'-TO_MBOX 
'-RIP 
'-OIP 
'-DNF 
RDLK_FL_DONE 
'-REQ FILL DONE 



Table 13-44: CEFSTS Field Descriptions 



Name 


Extent 


Type 


Description 


RDLK 


0 


WC 


Indicates that a READ_LOCK was in progress. 


LOCK 


1 


wc 


Indicates that an error occurred and the register is locked. 


TIMEOUT 


2 


WC 


FILL failed due to transaction timeout. 


RDE 


3 


wc 


FILL failed due to Read Data Error. 


LOSTJERR 


4 


wc 


Indicates that more than one error related to fills occurred. 


IDO 


5 


RO 


NDAL identification bit for the read request. 


IREAD 


6 


RO 


This is an Istream read from the Mbox which may be aborted. 


OREAD 


7 


RO 


This is an outstanding OREAD. 


WRITE 


8 


RO 


This read was done for a write. 
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Table 13-44 (Cont.): CEFSTS Field Descriptions 



Name 


Extent 


Type 


Description 


TO.MBOX 


9 


RO 


Data is to be returned to the Mbox. 


RIP 


10 


RO 


READ invalidate pending. 


OIP 


11 


RO 


OREAD invalidate pending. 


DNF 


12 


RO 


Do not fill - data not to be written into the cache or validated 
when the fill returns. 


RDLK_FL_DONE 


13 


RO 


Indicates that the last fill for a READJLOCK arrived. 


REQ_PTLL_DONE 


14 


RO 


Indicates that the requested quadword was successfully returned 
from the NDAL. 


COUNT 


16:15 


RO 


For a memory space transaction, indicates how many of the fill 
quadwords have been successfully returned. For I/O space, is set 
to ll(BIN) when the transaction starts as only one quadword will 
be returned. 


UNEXPECTED_FILL 


21 


WC 


Set to indicate that an unexpected fill was received from the 
NDAL. 



13.5.5.1.1 RDLK 

RDLK is set to show that a READJLOCK is in progress. This bit is write-one- to-clear. The side 
effect of performing a write-one-to-clear to this bit is to clear the VALID bit for an entry which had 
its RDLK bit set; this has the effect of clearing out the FILL_CAM entry. This is the same action 
which is taken when a WRITE_UNLOCK is received. Microcode uses this functionality during 
certain error sequences; the bit is implemented in the zero position to make the microcoding as 
efficient as possible. r » 

. . - - 

This bit is normally not read as a one by software, because the microcode ensures that the 
READ_LOCK-WRITE_UNLOCK sequence is an indivisible operation. If, however/ the first 
quadword of a READ_LOCK is returned successfully and then the transaction either times out 
or is terminated in RDE, CEFSTS is loaded with the RDLK bit set. 

13.5.5.1.2 LOCK 

The LOCK bit is set when a read transaction which has been sent to memory terminates in Read 
Data Error or in Timeout. At the same time, all information corresponding to the read is loaded 
from the FILL_CAM into the CEFSTS register. When the LOCK bit is set, one of TIMEOUT, 
RDE, or UNEXPE CTED_FILL is also set to indicate the type of error. Once the LOCK bit is 
set, none of the information in CEFSTS or CEFADR changes, with the possible exception of 
LOST_ERR, until the LOCK bit is cleared. 

Hardware sets the LOCK bit and software clears it by writing a one to that location. 

13.5.5.1.3 TIMEOUT 

TIMEOUT is set when a read transaction which was sent to the NDAL times out for some reason. 
When TIMEOUT is set, the LOCK bit is also set. 

Hardware sets the TIMEOUT bit and software clears it by writing a one to that location. 
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13.5.5.1.4 RDE 

RDE (Read Data Error) is set when a read transaction which was sent to the NDAL terminates in 
RDE. When the RDE bit is set, the LOCK bit is also set. The UNEXPECTED_FILL bit will be set 
as well, if the RDE was actually unexpected (no read corresponding to the RDE was outstanding 
when that RDE was received). 

Hardware sets the RDE bit and software clears it by writing a one to that location. 

13.5.5.1.5 LOST_ERR 

The LOST_ERR bit is set when CEFSTS is already locked and another RDE, timeout, or 
unexpected fill error occurs. This indicates to software that multiple errors have happened and 
state has not been saved for every error. 

Hardware sets the LOST_ERR bit and software clears it by writing a one to that location. 

13.5.5.1.6 IDO 

IDO corresponds to the NDAL signal, P%H>_H<0>, which was issued with the read that failed. 
It also indicates which one of the two FILL_CAM entries was used to save information about the 
transaction while it was outstanding. 

13.5.5.1.7 IREAD 

IREAD indicates that the transaction in error was an IREAD. 

13.5.5.1.8 OREAD 

OREAD indicates that the transaction in error was an OREAD; the OREAD may have been done 
for a write, a READ_LOCK, or a read modify. 

13.5.5.1.9 WRITE 

WRITE indicates that the transaction in error was an OREAD done because of a write request. 

13.5.5.1.10 TO_MBOX 

TO_MBOX indicates that data returning for the read was to be sent to the MBOX. 

13.5.5.1.11 RIP 

RIP (Read Invalidate Pending) is set when a cache coherency transaction due to a read on the 
NDAL is requested for a block which has Oread fills outstanding at the time. This triggers a 
writeback of the block when the fill data arrives; a valid copy of the data is kept in the cache. 

13.5.5.1.12 OIP 

OIP (Oread Invalidate Pending) is set when a cache coherency transaction due to an OREAD or 
a WRITE on the NDAL is requested for a block which has OREAD fills outstanding at the time. 
This triggers a writeback and invalidate of the block when the fill data arrives. 
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13.5.5.1.13 DNF 

DNF (Do Not Fill) is set when data for a read is not to be written into the Bcache. This is the 
case when the cache is off, in ETM, or when the read is to I/O space. The assertion of this bit 
prevents the block from being validated in the cache. 

13.5.5.1 .14 RDLK_FL_DONE 

This bit is set in the fill cam when a READJLOCK hits in the Bcache or the last fill arrives from 
the BIU for a READJLOCK. Once this is set, the corresponding WRITEJJNLOCK is allowed to 
proceed. This overrides the FILL.CAM block conflict on the WRITE JUNLOCK which is inevitable 
since the READJLOCK is held in the FILL.CAM until the WRITE.UNLOCK is done. 

1 3.5.5.1 .1 5 REQ_FILL_DONE 

REQ_FILL_DONE is set when the requested quadword of data was successfully received from 
the NDAL. This is used to allow error handling software to differentiate between an error which 
occurred before the requested data was received, and an error which occurred after the requested 
data was received. 

If the error occurs while the requested data is being returned, such as the requested data being 
returned with RDE, it is as if the requested data was not received. REQ_FILL_DONE will not 
be set because the requested data was not successfully received. 

13.5.5.1.16 COUNT 

These two bits indicate how many of the expected four quadwords have been returned successfully 
from memory for this read. If they are OO(BIN), no quadwords have returned, if they are Ol(BIN), 
one quadword has returned, etc. If the entry was for a quadword read, the count bits are set to 
ll(BIN) when the reference is sent out. 

As an example, if RDE is returned before any other RDR returns for a hexaword request, COUNT 
will be OO(BIN), to indicate that no quadwords of data were successfully returned. 

13.5.5.1 .17 UNEXPECTEDJILL 

UNEXPECTED_FILL is set to indicate that an RDE or an RDR cycle was received from the 
NDAL with an ID for which the FILL.CAM entry was not valid. When UNEXPECTED.FILL is 
set, CEFSTS and CEFADR are loaded and locked. RDE will also be set if the unexpected fill was 
an RDE rather than an RDR. 

UNEXPECTED_FILL is a write-one- to-clear bit which is set by hardware and cleared by software. 



13.5.5.2 Fill Error Address (CEFADR) 

The CEFADR register holds the original quadword read address of a fill which ended in an error 
condition. It is loaded when an error is detected on a fill. It is a read-only register. 
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CEFADR is locked when CEFSTS is locked. Its contents are not changed during reset. 
Figure 13-34: IPR AB (hex), CEFADR 

31 30 29 28|27 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| Fill error address I 0| 0| 0| : CEFADR 
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13.5.6 NDAL Error Registers (NESTS, NEOADR, NEOCMD, NEDATHI, NEDATLO, 
NEICMD) 

The NDAL error registers hold information related to NDAL errors. NESTS, NDAL Error Status, 
holds error bits relating to any problems encountered. 

NEOADR, NDAL Error Output Address, holds the address corresponding to the cycle which was 
in error. NEOCMD, NDAL Error Output Command, holds the command bits corresponding to 
the cycle in error. 

NEDATHI, NDAL Error Data High Longword, and NEDATLO, NDAL Error Data Low Longword, 
hold the data from an NDAL cycle where NVAX detected a parity error on the bus. NEICMD, 
NDAL Error Input Command, holds the command bits corresponding to a cycle with a parity 
error. 

The NDAL error registers are not affected by reset: their contents are not changed during reset. 



13.5.6.1 NDAL Error Status IPR (NESTS) 

The NESTS register holds information about any errors which happened on the NDAL. All six 
bits in this register are write-one- to-clear. Reset does not affect this register. Power-up does not 
initialize the register. 

Figure 1 3-35: IPR AE (hex), NESTS 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08107 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I x| x| x| xl x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| x| I I I I I I : NESTS 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I I I 
I I I 

I I * -NOACK 
I N -BADWDATA 
*-LOST_OERR 
-PERR 
*-INCON_PERR 
-LOST PERR 



Table 1 3-45: NESTS Field Descriptions 

Name Extent Type Description 



NOACK 0 WC Indicates that P%ACK_L was not asserted for an outgoing NVAX 

cycle. This bit locks NEOADR and NEOCMD. 

BADWDATA 1 WC Indicates that an outgoing data cycle was accompanied by the 

BAD WD ATA command. This bit locks NEOADR and NEOCMD. 

LOST.OERR 2 WC Indicates that multiple outgoing errors, either NOACK or 

BADWDATA, were detected. 

PERR 3 WC Indicates that a parity error was detected on the NDAL. This bit 

locks NEDATHI, NEDATLO, AND NEICMD. 
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Table 13-45 (Cont): NESTS Field Descriptions 



Name 


Extent T^pe 


Description 


INCONJPERR 
LOSTJPERR 


4 WC 

5 WC 


Inconsistent parity error. 

Indicates that multiple NDAL parity errors were detected. 



13.5.6.1.1 NOACK 

NOACKis set when NVAX detects that P%ACKJL was not asserted on the NDAL for an outgoing 
NVAX cycle. When NOACK is set, NEOADR and NEOCMD are locked so that software can read 
them to see what transaction was being attempted when the error occurred. 

NOACK is set on any outgoing NVAX cycle which is not acknowledged, whether it was an address 
cycle or a data cycle. The information which is locked in NEOADR and NEOCMD corresponds 
to the address cycle of the transaction. For example, if an outgoing write data cycle is not 
acknowledged, the address cycle for that write operation is saved in NEOADR and NEOCMD. 

NOACK is not set if there was a previous BADWDATA If a BADWDATA cycle is NOACKd, both 
BADWDATA and NOACK are set. 

NOACK is cleared by a write-one-to-clear. 

13.5.6.1.2 BADWDATA 

BADWDATA is set when the BIU receives data for a writeback from the cache which had an 
uncorrectable ECC error, and thus is being issued on the NDAL with the BADWDATA command. 
When BADWDATA is set, NEOADR and NEOCMD are locked so that software can read them to 
retrieve the information about the failure. 

The address for the write operation is captured in NEOADR, and the command information for 
the cycle is captured in NEOCMD. 

BADWDATA is not set if there was a previous NOACK If a BADWDATA cycle is NOACKd, both 
BADWDATA and NOACK are set. 

13.5.6.1.3 LOST_OERR 

LOST_OERR is set when NOACK or BADWDATA is already set and another one of those errors 
occurs. It notifies software that state was saved only for the first outgoing error. 

LOST_OERR is cleared by a write-one-to-clear. 

13.5.6.1.4 PERR 

PERR is set when NVAX detects a parity error on the NDAL. When PERR is set, NEDATHI, 
NEDATLO, and NEICMD are locked so that software can read them to see what was on the 
NDAL when the error occurred. 

Since NVAX calculates parity on every cycle, PERR will be set on both its own transfers and the 
transfers of other devices which fail the parity check. 

PERR is cleared by a write-one-to-clear. 
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13.5.6.1.5 INCON_PERR 

INCON_PERR (Inconsistent parity error) is set when an NDAL parity error is detected on a cycle 
which is also acknowledged with P%ACKJL. This means that NVAX detected a parity error but 
some other device acknowledged the transfer. 

INCON__PERR is only set in conjunction with PERR. It is not set unless PERR is set. If one 
NDAL parity error has already occurred, setting PERR, but INCON_PERR was not set for that 
cycle, a subsequent cycle with an inconsistent parity error will not cause INCON_PERR to be set. 

INCON_PERR is cleared by a write-one- to-clear. 



13.5.6.1.6 LOST_PERR 

LOST_PERR is set when PERR is already set and another NVAX transfer fails the parity check. 
LOST_PERR notifies software that multiple NVAX transfers have failed the parity check; state 
was saved only for the first. 

LOST_PERR is cleared by a write-one-to-clear. 



13.5.6.2 NDAL Error Output Address IPR (NEOADR) 

The NEOADR register is loaded for every address cycle which the Cbox drives onto the NDAL, 
unless it is locked. It is loaded during the cycle when the corresponding P%ACK_L should be 
asserted on the NDAL. It is locked when the NOACK bit in the NESTS register is set. 

When NEOADR is locked, it contains the address information for the first transaction which 
failed. If it is read when it is not locked, it contains information from the last address cycle 
which was acknowledged on the NDAL. 

The format of NEOADR matches the low longword of the NDAL during an address cycle. 
I NEOADR is read-only to software. Its contents are not changed during reset. 



Figure 13-36: IPR BO (hex), NEOADR 



31 30 29 28127 26 25 24|23 22 21 20 119 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04 | 03 02 01 00 
h — + — + — + — + — + — + — + — + — + — + — + — +__+ — +__+ — + — +— + — +__ + — +— + — +_-+ — + — +— + — + — + — +— + — + 

I NDAL address I : NEOADR 



13.5.6.3 NDAL Error Output Command (NEOCMD) 

The NEOCMD register is loaded and locked exactly as NEOADR is loaded and locked. The 
format of NEOCMD is similar to that of the high longword of the NDAL during an address cycle. 
The high quadword byte enable positions are NOT included, since NVAX only uses quadword 
byte-enabled transactions; and the NDAL ID and command are added in the lower four bits of 
| the longword. 
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The contents of NEOCMD are not affected by reset. I 



Figure 13-37: IPR B2 (hex), NEOCMD 



31 30 29 28 |27 


26 25 24|23 22 21 20 | 


19 18 17 16115 14 13 12 1 11 10 09 08|07 06 05 04 1 03 02 01 00 


1 LEN | x| X| x| 


1 x| x| x| x| 


x| x| x| 


X| X| x| x| BYTE_EN | 0| ID | CMD | : NEOCMD 










Table 13-46: 


NEOCMD Field Descriptions 


Name 


Extent 


Type 


Description 


CMD 


3.-0 


RO 


NDAL command as driven by NVAX during the transaction. For 
specific values, see Section 3.3.4.2. 


ID 


6:4 


RO 


Commander ID as driven by NVAX during the transaction. For 
specific values, see Section 3.3.4.3. 


BYTE_EN 


15:8 


RO 


Byte enable as driven by NVAX during the transaction. For specific 
values, see Section 3.3.4.1. 


LEN 


31:30 


RO 


Length of the NDAL transaction. For specific values, see 
Section 3.3.4.1. 



The meanings of these fields are described in Chapter 3. 

13.5.6.4 NDAL Error Input Command (NEICMD) 

NEICMD, NEDATHI, and NEDATLO are loaded at the same time and they are locked at the 
same time. They are all loaded when a parity error occurs; at this time the PERR bit is set in 
NESTS, which locks the three registers. If a second NDAL parity error happens, the registers are 
not loaded again; they are not loaded again until after they are unlocked when software clears 
PERR. 

NEICMD contains the P%CMD_H<3:0>, P%ID_H<2:0>, and P%PARITY_H<2:0> bits from the 
failed transfer. 

NEICMD is a read-only register. Its contents are not changed during reset. I 
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Figure 13-38: IPR B8 (hex), NEICMD 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — +—+--+ — +— + — + — +— + — + — + — + — + — +—+—+--+—+—+—+ — +— + — +__+__+ — +— + — +— + — +— + — + 
| x| x| x| x| x| x| x| x| x| x| x| X| x| X| x| x| x| x! x| x| x| x| PARITY I ID I CMD I : NEICMD 



13.5.6.4.1 PARITY 

The PARITY field corresponds to the NDAL lines P%PARITY_H<2K)>. 

13.5.6.4.2 ID 

The ID field corresponds to the NDAL lines P%ED_H<2:0>. 

13.5.6.4.3 CMD 

The CMD field corresponds to the NDAL lines P%CMD_H<2:0>. 

13.5.6.5 NDAL Error Data High and NDAL Error Data Low (NEDATHI and NEDATLO) 

NEDATHI and NEDATLO behave analogously to NEICMD. They capture P%NDAL_H<63:0> 
during a cycle with a parity error. NEDATHI contains the high longword of data from the 
NDAL (P%NDAL_H<63:32>); NEDATLO contains the low longword of data from the NDAL 
(P%NDAL_H<31:0>). 

The format of NEDATHI and NEDATLO must be interpreted based on the CMD found in 
NEICMD. If the CMD field shows that the cycle was a data cycle, the registers contain two 
longwords of data. If the CMD field shows that the cycle was an address cycle, the registers are 
in the format of an NDAL address cycle, as shown in Figure 13-39 and Figure 13-40. 

The contents of NEDATHI and NEDATLO are not affected by reset. 



13-84 TheCbox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



Figure 13-39: IPR B4 (hex), NEDATHI 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 

+ + + + + h — + + + — + + + + — + + + + + — + + — + + + + + + + + + + + + + 

| LEN | UNDEFINED | BYTE_EN | UNDEFINED | : NEDATHI 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



Figure 13-40: IPR B6 (hex), NEDATLO 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — +--+--+—+—+—+--+ — +--+ — + — + — +— + — +—+--+ — +--+ — +— + — +— + — +— + — + — +--+ — + — + — +--+ — + 
| address I : NEDATLO 

+ — +--+—+--+—+—+—+ — +— + — + — + — +__+—+—+—+ — + — + — +— + — +— + — +__+ — + — + — + — + — + — +— + — + 
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13.5.7 Backup Cache Tag Store Access Through IPR Reads and Writes (BCTAG) 

Direct access to the backup cache tag store is provided to aid in error recovery and diagnosis and 
to assist testing. These accesses work whether the cache is on or off, in ETM or in force hit mode. 

If there is a valid FILL_CAM entry for the same cache block which is being accessed through an 
IPR read or write, the IPR read or write is stalled until the fills return and the FILL_CAM entry 
is no longer valid. 

When the backup cache tag store is being accessed through IPR reads and writes, address bits 
<24:22> = 100 (BINARY). Address bits <20:5> are used as the index into the tag store RAMs; 
these indicate which backup cache location is to be written or read. 

Figure 13-41 : Backup Cache Tag Store IPR Addressing Format 



31 30 29 28|27 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — +--+ — + — + — + — + — + — + — + — + — + — + — + — +--+--+ — + — +— + — + — +— + — + — + — + — + — + 
t SBZ |1 0 0 | x | | BCTAG Index | SB2 | 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I 

* -BCTAG Index or SBZ, based on cache size. 



Some or all of bits <20:17> are not actually used as the index if the cache is smaller than 2 
megabytes. This is set out explicitly in Table 13-48. 

The format for reading and writing the backup cache tag store as an IPR is described in 
Figure 13-42 and Table 13-47. 

Figure 13-42: IPRs 01000000 thru 011FFFE0 (hex), BCTAG 



31 30 29 28|27 26 25 24|23 22 21 20119 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 

I- + + + + + + + + + +--H + + + — + + + + + — + h h + — + + + — + + H + + + 

I TAG | | ECC | | | X| X| X| X| X| X| X| X| X| : BCTAG 

h — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I I I 

*-TAG or 0, based | »-VALID 

on cache size '-OWNED 



Table 13-47: BCTAG Field Descriptions 



Name 


Extent 


Type 


Description 


VALID 


9 


RW 


Valid bit 


OWNED 


10 


RW 


Ownership bit 


ECC 


16:11 


RW 1 


ECC check bits 


TAG 


31:17 


RW 


Tag data 



1 The ECC bits are written from the value given in the IPRJWRITE only if the SW_ECC bit of the CCTL IPR is set. 
Otherwise, the Cbox generates and writes correct ECC for the tag, owned and valid values being written. 
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Some or all of TAG<20:17> are not actually used as tag if the cache is larger than 128 kilobytes. 
This is set out in Table 13-48. 



Table 13-48: Tag and Index interpretation for BCTAG iPR 



Cache size 


Tag bits used 


Index bits used 


128 kilobytes 


TAGk31:17> 


Index<16:5> 


256 kilobytes 


TAG<31:18> 


Index<17:5> 


512 kilobytes 


TAG<31:19> 


Index<18:5> 


2 megabytes 


TAGk31:21> 


Index<20:5> 



The tag store must be initialized to a known state when the chip is powered up. This is done 
through IPR_WRITEs to BCTAG. 

When the tag store is read, the ECC check bits are read out directly from the tag store in the 
format shown. ECC is not checked on IPR accesses to the tag store; no errors can occur during 
these accesses. 

Some care must be taken if IPR reads of the tag store are done while other transactions are in 
progress. The tag information read out may not be what the programmer expects if cache misses 
or cache coherency transactions are in progress on the block which is being read. For example, 
if a cache miss is in progress, the new tag will be in the tag store but the valid and owned bits 
will be clear. 
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13.5.8 Backup cache deallocates through IPR access (BCFLUSH) 

The Backup Cache Deallocate IPR is a write-only register which software uses to explicitly request 
the deallocation of a cache block. For example, this register may be used when hardware has put 
the cache into ETM and software wants to request writeback of the owned blocks to memory. 

If there is a valid FILL_CAM entry for the same cache block which is being flushed, the flush is 
stalled until the fills return and the FILL_CAM entry is no longer valid. 

Figure 1&-43: IPRs 01400000 thru 015FFFE0 (hex), BCFLUSH 



31 30 29 28|27 26 25 24123 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04 | 03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I SBZ | 1| 0| 1| x| Bcache Tag Deallocate Index | SBZ | : BCFLUSH 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 



When BCFLUSH is written, the Cbox accesses the tag store. If the block is invalid, no further 
action is taken. If the block is valid but not owned, the Cbox sends a block invalidate to the Mbox 
and invalidates the entry in the Bcache tag store. If the block is valid and owned, it sends a 
block invalidate to the Mbox, performs a writeback of the data, and invalidates the entry in the 
tag store. 

This behavior takes place whether the cache is on, of£ in ETM, or in FORCEJHIT mode. In 
FORCEJEHT mode, BCFLUSH does a real lookup of the tag store and does not force the access 
to hit. Software must take care not to force deallocates when cache state is not consistent with 
the state of memory. For example, when the cache is of£ valid and owned bits may be set for 
blocks which are no longer up-to-date with respect to memory. 

When a deallocate is done, the VALID and OWNED bits will be cleared as necessary, and the 
value of the stored TAG is modified. Its value is UNPREDICTABLE. Correct ECC is stored on 
the tag store entry. 

A BCFLUSH operation never changes the data stored in the data RAMs. 

Errors are detected and reported during BCFLUSH operations. 

The index given is interpreted as in Table 13—48, based on the size of the cache. 

BCFLUSH may be used when the Bcache is on, as the Pcache is kept a subset of the Bcache 
during these operations. However, new blocks may be allocated due to memory reads and writes 
as the cache is being flushed. 
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13.6 Cbox Control Description 

The Cbox control consists of the following sections: 

• Mbox Interface. Controls receiving commands from the Mbox including checking for 
read/write conflicts, and sending data and invalidates back to the Mbox. 

• Cbox Arbiter. Decides which Cbox request should be serviced next. 

• Tag Store Control. Controls access to the tag store RAMs, hit calculation, ECC generation 
and checking for tag RAMs, tag RAM error handling. 

• Data Ram Control. Controls access to the data RAMs, ECC generation and checking for data 
RAMs, data RAM error handling. 

• NDAL interface. Controls access to the NDAL queues and implements the NDAL protocol 
described in Chapter 3. 

The tag store controller is a state machine which executes any of the following tasks, upon 
instruction from the arbiter: 

• C_TAG%%DREAD_CMD. Performs a lookup for a data-stream read. Hits if tag matches and 
is valid. 

• C_TAG%%IREAD_CMD. Performs a lookup for an instruction-stream read. Hits if tag 
matches and is valid. The operation may be cancelled midstream if the IREAD is aborted. 

• C_TAG%%OREAD_CMD. Performs a lookup which requires ownership. Hits if tag matches 
and is valid and owned. 

• C_TAG%%R_INVAL_CMD . Performs a cache coherency lookup as the result of an NDAL 
DREAD or IREAD; clears OWNED if necessary. 

• C_TAG% % 0_INVAL_CMD . Performs a cache coherency lookup as the result of an NDAL 
OREAD or WRITE; clears VALID and/or OWNED if necessary. 

• C_TAG%%FILL_CMD. Sets the VALID and/or OWNED bit for a fill which has completed. 

• C_TAG%%IPR_DEALLOC_WRITE_CMD. Performs a lookup for a deallocate; clears VALID 
and OWNED bits if the block was owned. 

• C_TAG%%IPR_TAG_WRITE_CMD . Writes the tag store with given data. 

• C_TAG%%IPR_TAG_READ_CMD. Reads the tag store from the location requested. 

When the command given has been executed, the tag store controller notifies the arbiter that it 
has finished. 

The data RAM controller is a state machine which executes any of the following tasks, upon 
instruction from the arbiter: 

• CJD AT% %DRE AD_CMD . Reads four quadwords of data-stream data from the Bcache and 
sends them to the Mbox interface. 

• C_DAT%%IREAD_CMD . Reads four quadwords of instruction-stream data from the Bcache 
and sends them to the Mbox interface. The operation may be cancelled midstream if the 
Iread is aborted. 

• C_DAT%%WB_CMD. Reads four quadwords of data from the Bcache and sends them to the 
WRITEBACK.QUEUE. 

• CJDAT%%RM_WRITE_CMD. Performs a read-modify-write operation on the Bcache 
quadword. 
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• C_DAT%%WRITE_BMO_CMD. Performs a full quadword write on the Bcache. 

• CJDATO%FILL_CMD. Writes fill data into the Bcache; merges write data with the fill if 
necessary. 

When the command given has been executed, the data RAM controller notifies the arbiter that 
it has finished. 

The arbiter looks at the DREAD_LATCH, the IREAD_LATCH, the WRITE.QUEUE, and 
incoming transactions from the CBOX_BIU_INTERFACE to decide which to service next. It 
notifies the tag store controller and data RAM controller of which command to execute next. 

Fills and cache coherency requests both arrive in the NDAL_IN_QUEUE and are sent to the 
Cbox proper through the CBOX_BIU_INTERFACE. They are processed in order; therefore, one 
does not have priority over the other. 

When a transaction such as a read miss causes a cache block to be deallocated, the 
deallocate always takes place as the next data RAM transaction. Transactions in the 
CBOX_BIU_INTERFACE take next-highest priority. In the normal case, the DREADJLATCH 
takes next priority, the IREAD_LATCH next, and the WRITE.QUEUE takes lowest priority. 
These priorities change if there are special circumstances, as shown in the tables which follow. 

Table 13-49: Cbox Task Priority Under Normal Conditions. 

Priority Source of Transaction 

1 Deallocate caused by previous transaction. 

2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 

3 DREADJLATCH 

4 IREAD_LATCH 

5 WRITE.QUEUE 



Table 13-50: Cbox Task Priority When DWR_CONFLICT Bits are Set In the WRITE_QUEUE. 

Priority Source of Transaction 

1 Deallocate caused by previous transaction. 

2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 

3 IREAD.LATCH 

4 WRITE.QUEUE 

5 DREADJLATCH - not serviced until DWR_CONFLICT bits are clear 
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Table 1 3-51 : Cbox Task Priority When IWR_CONFUCT Bits are Set in the WRITE_QUEUE. 

Priority Source of Transaction. 

1 Deallocate caused by previous transaction. 

2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 

3 DREADJLATCH 

4 WRITE_QUEUE 

5 IREAD_LATCH - not serviced until IWRCONFLICT bits are clear 



Table 13-52: Cbox Task Priority When a DREAD LOCK is in progress until the 
WRITEJJNLOCK Is done. 

Priority Source of Transaction 

1 Deallocate caused by previous transaction. 

2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 

3 WRITE_QUEUE - the WRITEJJNLOCK corresponding to the DREAD JLOCK is the 
only write which will arrive unless an error occurs; in this case the IPRJWRITE clearing 
the RDLK bit in the FILL_CAM is the next write to arrive. 

4 DREADJLATCH - not serviced until the WRITEJJNLOCK completes or the 
FILL_CAM RDLK bit is cleared. 

5 IREAD_LATCH - not serviced until the WRITEJJNLOCK completes or the FILL.CAM 
RDLK bit is cleared. 

There are various resources in the Cbox which must be available for the start of a transaction. 
The necessary conditions vary, depending on the transaction in question. 

Necessary conditions before servicing a fill from the CBOX_BIU_INTERFACE are as follows: 

1. The data RAMs and the tag store must be free. The tag store is only strictly necessary for 
the last fill but for implementation simplicity, both are required for all fills. 

2. The WRITEBACK.QUEUE must not be full. A writeback may be necessary at the completion 
of the fill. 



Necessary conditions before servicing a 
CBOX_BIU_INTERFACE are as follows: 

1. The tag store must be free. 

2. The WRITEBACK.QUEUE must not be full. 



cache coherency request from the 



Necessary conditions before servicing a transaction from the DREADJLATCH or the 
IREADJLATCH are as follows: 

1. The data RAMs and the tag store must be free. 

2. A FILL_CAM entry must be available, in case the read misses. 

3. There must be an available entry in the NON_WRITEBACK_QUEUE, in case the read misses. 

4. There must be no valid entry in the FILL_CAM for the same cache block as that of the new 
request. 
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5. There must be no RDLK bit set in the FILL.CAM, indicating that a READ_LOCK - 
WRITE_UNLOCK sequence is in progress. 

6. There must be no block conflict with any WRITE.QUEUE entry. 

7. The WRITEBACK.QUEUE must not be full. 

Necessary conditions before servicing a full quadword write from the WRITE_QUEUE are as 
follows: 

1. The tag store must be free. 

2. If a read lock is not outstanding, a FILL_CAM entry must be available, in case the write 
misses and requires an OREAD. 

3. If a read lock is not outstanding, there must be an available entry in the 
NON_WRITEBACK_QUEUE, in case the write misses. 

4. There must be no valid entry in the FILL_CAM for the same cache block as that of the new 
request, unless the new request is a WRITE_UNLOCK. 

5. If there is a READ_LOCK in the FILL_CAM, the fills for the READ_LOCK must have 
completed. 

6. The WRITEBACK_QUEUE must not be full. 

The tag store lookup for a full quadword write may be done while the data RAMs are busy with 
another transaction. When the data RAMs free up, the full quadword write is done. If full 
quadword writes are streaming through the WRITE_QUEUE, this effectively pipelines the tag 
store accesses and the data RAM accesses so that the writes take place at the maximum write 
rep rate of the data RAMs. This would not be the case if the arbiter required both the data RAMs 
AND the tag store to be free before starting the full quadword write. 

Necessary conditions before servicing any WRITE_QUEUE entry other than a fall quadword 
write are as follows: 

1. The tag store and the data RAMs must be free. 

2. If a read lock is not outstanding, A FILL_CAM entry must be available, in case the write 
misses and requires an OREAD. 

3. If a read lock is not outstanding, there must be an available entry in the 
NON_WRITEBACK_QUEUE, in case the write misses. 

4. There must be no valid entry in the FILL_CAM for the same cache block as that of the new 
request, unless the new request is a WRITE_UNLOCK. 

5. If there is a READJLOCK in the FILL.CAM, the fills for the READ.LOCK must have 
completed. 

6. The WRITEBACK_QUEUE must not be full. 
From the above lists, the following is true: 

1. When the data RAMs are busy, the only tag store operations which may proceed are cache 
coherency requests and full quadword write requests. 

2. No transaction from the Mbox which produces a block conflict with the FILL_CAM 
may proceed, except a WRITE_UNLOCK This includes I/O space transactions and IPR 
transactions, for implementation simplicity. 
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13.7 Transaction Descriptions 

13.7.1 IPR Reads and IPR Writes 

These transactions are described in Section 13.5. 

13.7.2 I/O Space 

I/O space references are recognized when address bits <31:29> are equal to all ones. Address 
bits <31:0> are used for I/O space reads and writes, which may reference bytes. All bits of the 
address are driven onto the NDAL. 

In addition, the byte enable field is valid for all I/O space reads and writes, as described in 
Chapter 3. When the Cbox receives an I/O space read or write, it passes the byte enable from 
the Mbox out through the BIU to the NDAL. 

I/O space references are never cached in the Bcache. All such references are passed directly to 
the NDAL. I/O space fill data which returns is passed directly to the Mbox. 

I/O space references are always quadword length. When the quadword returns on the NDAL, 
the Cbox returns it directly to the Mbox and asserts C%LAST_PILL_H so the Mbox does not expect 
any more fills. 

I/O space references also result from IPRJREADs and IPR_WRITEs to the Cbox which are not 
in Cbox register space. The Cbox converts these to I/O space reads and writes, as described in 
Section 13.5. 

Before an I/O space read is allowed to proceed, the WRITE_QUEUE is flushed. I/O space 
writes are naturally ordered with respect to previous I/O space writes since they go into the 
WRITE_QUEUE behind any previous I/O space writes. They are also ordered with respect to 
previous reads and subsequent reads through the write conflict bit mechanism. 

There are situations where I/O space writes will appear out of order with respect to memory 
space writes. See Section 13.14 for an explanation of when this may happen. 

READ_LOCKs and WRITEJJNLOCKs to I/O space are not supported by the Cbox. If software 
issues these transactions through the Mbox, the Cbox converts them to normal DREADs and 
WRITEs on the NDAL. 

1 3.7.3 Clear Write Buffer 

In previous systems, Clear Write Buffer (CWB) was implemented as a separate command. 
NVAX implements this as an IPR read or write which the Cbox converts into an I/O space 
read or write on the NDAL. As this transaction passes through the Cbox, it has the effect 
of clearing previous entries in the WRITE_QUEUE, the NON_WRITEBACK_QUEUE, and the 
WRITE BACK_QUEUE . 

An IPR_READ to clear the write buffers causes all the DWR_CONFLICT and IWR.CONFLICT 
bits in the WRITE_QUEUE to be set. All writes are flushed as top priority, and then the I/O space 
read is issued to the NDAL and system. Which device responds to the read is system-dependent. 
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An IPR_WRITE to clear the write buffers goes into the WRITE_QUEUE. If any reads are 
outstanding, they complete first due to their higher priority and then the writes complete. If 
a new read arrives while the IPR.WRITE is still in the WRITE_QUEUE, the conflict bit is set 
for that entry so the read does not complete until after the IPR_WRITE to clear the write buffer. 
After that IPR__WRITE completes, read/write priority goes back to the default behavior. 

The Clear Write Buffer has the effect of clearing both the WRITEBACK.QUEUE and the 
NON_WRITEBACK_QUEUE, as follows: the CWB, whether issued as an IPR_READ or an 
IPR.WRITE, enters the NON_WRITEBACK_QUEUE. Since the WRITEBACK.QUEUE takes 
priority over the NON__WRITEBACK_QUEUE, any previous writebacks will be issued to the 
NDAL before the CWB is issued from the NON_WRITEB ACK_QUEUE . Any entries which were 
already in the NON_WRITEBACK_QUEUE will be issued before the CWB as transactions in the 
queue are always issued in order. Thus, before the CWB completes, both outgoing NDAL queues 
are flushed of all previous transactions. If the CWB is issued as an IPR_READ, software receives 
positive acknowledgement that the queues were cleared when the fill returns. 

The IPR_WRITE is issued to the NDAL as an I/O space write. As with the I/O space read to clear 
the write buffers, the device which responds is system-dependent. 
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13.7.4 Memory Read Hit 

Several different kinds of memory reads may arrive from the Mbox, as shown in the following 
table. 



Read Cbox action 

IREAD hits if tag matches and valid bit is set 

DREAD hits if tag matches and valid bit is set 

DREAD_MODIFY hits if tag matches and valid bit is set 

DREAD_LOCK hits if tag matches, valid bit is set, and ownership bit is set 



When the Mbox asserts M%CBOX_REF_ENABLE_L, the Cbox takes the command from M%S6_CMD_H. 
If the backup cache is occupied with another transaction, the Cbox puts an IREAD into the 
IREAD_LATCH or a DREAD into the DREAD_LATCH for later processing. Otherwise, the read 
bypasses the read latches and is started immediately. 

When both the tag store and the data RAMs are free, the transaction starts. The tag lookup is 
done in parallel with the data lookup. If the read hits, data is driven from the backup cache RAMs 
back through the CM_OUT_LATCH. The fill command is sent to the Mbox on C%CBOX_CMD_H<iK)>. 
Two cycles later, the Pcache fill is done while the Cbox drives data onto B%S6_DAIAJB<63:0>. 

Using the fastest RAM speed configuration, the backup cache access incurs an additional 4-cycle 
latency penalty beyond the Pcache access. Each subsequent quadword in the block takes an extra 
two cycles from the previous quadword. 

On a read hit in the backup cache, the requested quadword is always returned first to the Mbox. 
The subsequent quadwords are sent in wrapped order as shown in Table 13—53. 



Table 13-53: Order of quadwords read from the Bcache 



Requested QW 2nd QW returned 3rd QW returned 4th QW returned 



QWO 


QW1 


QW2 


QW3 


QW1 


QW2 


QW3 


QWO 


QW2 


QW3 


QWO 


QW1 


QW3 


QWO 


QW1 


QW2 
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13.7.5 Read Miss and Fill 

At the same time the tag store access is done for a read, the address is put in the FILL_CAM. If the 
read misses, that entry is validated and the address is sent to the NON__WRITEBACK_QUEUE. 

If the read command was DRE AD_M ODIFY and missed, it is converted to an OREAD on the 
NDAL. All other reads are sent as either IREADs or DREADs on the NDAL. 

From the NON_WRITEBACK_QUEUE the request goes across the NDAL to the memory 
interface. When the memory interface returns the fill, the Cbox puts the fill into the 
NDAL_IN_QUEUE . Since the block size is 32 bytes and the NDAL is 8 bytes wide, four fill 
transactions on the NDAL result from the read request. 

The arbiter services the CBOX_BIU_INTERFACE, and thus the fill, as highest priority. At this 
time, Cbox control takes the fill from the CBOX_BIU_INTERFACE and puts the data in the 
CM_OUT_LATCH. At the same time it starts writing the backup cache RAMs with the data, 
which takes at least three cycles, depending on RAM access time. The fill data is driven to the 
Mbox from the CM_OUT_LATCH as described in the cache hit section preceding. 

As fill data returns, the Cbox keeps track of how many quadwords have been received with a 
two-bit counter in the FILL__CAM. If two read misses are outstanding, fills from the two misses 
may return interleaved, so each entry in the FILL_CAM has a separate counter. When the last 
quadword of a read miss arrives, the new tag is written and the valid bit is set in the cache. The 
owned bit is set if the fill was for an Ownership Read. The FILL_CAM is made available for the 
next cache miss. 

If the RIP or OIP bit is set (and DNF is not set) in the FILL_CAM when the last fill returns, the 
arbiter immediately notifies the tag store control to start a cache coherency transaction on that 
block; nothing intervenes between the last fill and the cache coherency transaction. 

13.7.6 Write Hit 

A write from the WRITE_QUEUE is begun by accessing the tag store. It is a write hit if the tag 
matches, the valid bit is set, and the ownership bit is set. In this case the write data may be 
written into the data RAMs. The data RAMs are not accessed for the write until it is determined 
that the write hit. 

The write is somewhat complicated because we have ECC across 8 bytes in the data RAMs. If 
all bytes in the quadword are not to be written with new data, the old data is read out of the 
data RAMs during the tag store lookup and before the write is done. The new data is merged 
with the old so that ECC can be calculated across the new quadword. This action is known as 
read-modify-write. 

If byte enable indicates that the write is a full quadword write, the read-modify-write is not 
necessary. In this case, the tag store lookup may proceed even if the data RAMs are not available; 
when the RAMs then become available, the write is done (assuming the tag store access resulted 
in hit-owned). This allows sequential full quadword writes to be effectively pipelined, as the tag 
store lookup for the next write may proceed while the current write is being done into the data 
RAMs. If the fastest RAM configuration is used, this achieves a three-cycle repetition rate for 
full quadword writes. 

When the write is complete;, the entry is removed from the WRITE_QUEUE. 
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13.7.7 Write Miss 

If the tag store lookup for a write is done and the ownership bit is not set or the tag does not match, 
an ownership read is issued to the memory subsystem through the NON_WRITEBACK_QUEUE. 
At the same time, the new tag is written to the backup cache tag store with cleared VALID and 
OWNED bits. When the requested quadword returns through the NDAL_IN_QUEUE, the write 
data is merged with the fill data, ECC is calculated, and the new data is written to the cache 
RAMs. At this time the write is removed from the WRITE_QUEUE. When the fourth quadword 
returns, the valid bit and the ownership bit are set in the tag store. None of the fill data is sent 
to the Mbox, since the request originated from a write rather than from an Mbox read. 

13.7.8 Deallocates Due to CPU Reads and Writes 

When any tag lookup for a read or a write results in a miss, the cache block is deallocated to 
allow the fill data to take its place. If the block is not valid, no action is taken for the deallocate. 
If the block is valid but not owned, the block is invalidated in the backup cache tag store and 
an invalidate is sent to the Pcache. If the block is valid and owned, the block is written back 
to memory, invalidated in the tag store, and an invalidate is sent to the Pcache. The Hexaword 
Disown Write command is used to write the data back. 

If a writeback is necessary, it is done immediately after the read or write miss occurs. The miss 
and the deallocate are contiguous events and are not interrupted for any other transaction. 

When the block is invalidated or deallocated at the time of the miss, the "VALID and OWNED bits 
are cleared. The TAG is written with a value corresponding to the address of the read or write 
which just missed. When the fill returns, the VALID and OWNED bits are written appropriately. 

The four quadwords for the deallocate are read out from the bcache in the order shown in 
Table 13—53. They are driven on the NDAL in order from QWO to QW3, however, as required by 
the NDAL protocol for hexaword writes. 

1 3.7.9 DREAD J.OCK and WRITEJJNLOCK 

The Cbox receives DREAD_LOCK/WRITE_UNLOCK pairs from the Mbox. It never issues those 
commands on the NDAL. The Cbox always uses Ownership Read-Disown Write on the NDAL 
and depends on use of the ownership bit in memory to accomplish interlocks. 

When the cache is on, a DREADJLOCK which produces an owned hit in the backup cache causes 
no memory access. All four quadwords are read out of the Bcache and sent to the Mbox. The 
address is placed in the FILL_CAM to prevent any access of the block until the WRITE_UNLOCK 
is done. 

A DREADJLOCK which does not produce an owned hit in the backup cache results in an OREAD 
on the NDAL, whether the cache is on or off. When the cache is on, the WRITE_UNLOCK is 
written into the backup cache and is only written to memory if requested through a coherence 
transaction or due to a deallocate. When the cache is of£ the WRITE_UNLOCK becomes a 
Quadword Disown Write on the NDAL. 

When a DREAD_LOCK arrives in the DREADJLATCH, the WRITE_QUEUE is flushed before 
the DREADJLOCK is started. All transactions from the IREADJLATCH or the DREADJLATCH 
are prevented until the WRITE_UNLOCK takes place or until the RDLK bit in the FILL_CAM 
is cleared through an IPR_WRITE to the CEFSTS IPR. 
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During READ_LOCK/WRITE_UNLOCK processing, the NDAL_IN_QUEUE is serviced 
normally, so if the cache is on, the NDAL may see some writebacks while the 
DREAD_LOCKAVRITE_UNLOCK is in progress. 

When the Bcache is running in normal mode, a WRITE_UNLOCK is not looked up in the tag 
store as it is guaranteed to be owned in the cache. The arbiter initiates a read-modify-write 
directly to the data RAMs without any tag store access at all. If the Bcache is in ETM, the 
WRITE_UNLOCK is looked up, as the block may or may not be owned in the cache. 

When the Bcache is off, a WRITE.UNLOCK which is done without a preceding READJLOCK 
will be sent directly to the NDAL. In any other mode of Bcache operation, the WRITE_UNLOCK 
is expected to be preceded by a READJLOCK. When the cache is off, a WRITE_UNLOCK without 
a preceding READ_LOCK may be useful for error handling (this is not currently implemented in 
the microcode). 



13.8 Cache Coherency 

Since NVAX is used in multiprocessor systems, cache coherency requests requiring invalidates 
and/or writebacks arrive on the NDAL. These may require action in the Bcache and/or the Pcache. 
Under normal conditions, the Cbox ensures that the Pcache is a subset of the Bcache, as explained 
below. Thus, it is able to filter invalidate requests so that not all are sent to the Pcache. 

Table 13-54 shows the actions taken in the Bcache, based on the NDAL command which arrives 
and matches a cache block. 



Table 13-54: NVAX Backup Cache Invalidates and Writebacks 



NDAL Command 



Invalid block 



Valid & Unowned 



Valid & Owned 



IREAD,DREAD 

OREAD 
WRITE 
WDISOWN 



Invalidate 
Invalidate 



Writeback, set Bcache 
valid-unowned state 

Writeback, Invalidate 

Writeback, Invalidate 



to 



Whenever an invalidate is necessary in the Bcache, according to Table 13—54, an invalidate is 
also sent to the Pcache. . 

Invalidates are sent to the Pcache under the following circumstances: 

1. When an invalidate is necessary in the Bcache, due to a cache coherency request, the 
invalidate is also forwarded to the Pcache. 

2. When a cache miss causes a Bcache deallocate, a corresponding invalidate is forwarded to 
the Pcache. 

3. When a write to BCFLUSH causes a bcache deallocate, a corresponding invalidate is 
forwarded to the Pcache. 

4. When a OREAD or WRITE cache coherency request matches an entry in the FILL_CAM, 
the invalidate is forwarded immediately to the Pcache. When the last fill returns, a second 
invalidate is forwarded to the Pcache. 
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5. When the Bcache is off or in FORCE_HIT mode, ALL cache coherency requests result in 
invalidates to the Pcache. It is not strictly necessary to send invalidates for IREAD and 
DREAD cache coherency requests, as multiple caches may contain read-only copies of data, 
but for implementation reasons they ARE sent as invalidates to the Pcache. 

6. When the Bcache is in ETM, all OREAD and WRITE cache coherency requests result in 
invalidates to the Pcache. (IREAD and DREAD cache coherency requests do not result in 
invalidates to the Pcache.) A second invalidate is passed to the Pcache if the normal Bcache 
lookup conditions are met. 

NOTE 

When a cache coherency request hits in the cache and either VALID or OWNED is 
modified, the tag which is written to the cache is the same as the tag which was there 
originally. 
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13.9 Abnormal conditions 

This section describes the various modes of Bcache behavior as well as Cbox response when it 
detects an error. 

The Bcache has four operating states which are controlled by the following bits in the CCTL 
register: ENABLE, FORCE_HIT, SWJETM, and HW_ETM. The four states are ON, OFF, ETM, 
and FORCE JEflT. The four states are determined and prioritized as follows: 

1. OFF. If the ENABLE bit is cleared in CCTL, the Bcache is OFF and those conditions take 
precedence. 

2. FORCEJEHT. If the ENABLE bit is set and FORCE JHIT is set, the Bcache is in FORCE_HIT 
mode and those conditions take precedence. 

3. ETM. If the ENABLE bit is set, FORCE_HIT is cleared, and either SW_ETM or HW_ETM is 
set, the cache is in ETM mode and those conditions take precedence. 

4. ON. If the ENABLE bit is set and FORCEJHIT, SW_ETM, and HW_ETM are cleared, the 
cache is ON. 

The ON state is the normal operating condition of the cache. OFF, FORCE_HIT, and ETM modes 
are described in the sections which follow. A summary of the backup cache behavior when it is 
ON and incurring no errors is given in Table 13—55. 
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Table 13-55: Backup cache behavior while It is ON 



Cache 



Miss 

Transaction Invalid 



-Cache Response- 



Miss Valid 



Miss Owned Hit Valid 



Hit Owned 



CPU IREAD, READ 
DREAD memory 



CPU Read OREAD 
Modify memory 



CPU OREAD 
READ_LOCK memory 



CPU Write 



OREAD 
memory 



CPU 

WRITE UNLOCK 



Read 
memory, 
Pcache inval 



ORead 
memory, 
Pcache inval 



ORead 
memory, 
Pcache inval 



OREAD 
memory, 
Pcache inval 



Read 
memory, 
Pcache 
inval, Bcache 
dealloc 

OREAD 
memory, 
Pcache 
inval, Bcache 
dealloc 

OREAD 
memory, 
Pcache 
inval, Bcache 
dealloc 

OREAD 
memory, 
Pcache 
inval, Bcache 
dealloc 



Read cache 



Read cache 



OREAD 
memory, 
Pcache inval 



OREAD 
memory, 
Pcache inval 



Read cache 



Read cache 



Read cache 



Write cache 



-No tag store lookup; write Bcache unconditionally- 



Fill for 
OREAD for 
Write 

Fill for 
OREAD 



Fill 
READ 

NDAL 

IREAD, 

DREAD 

NDAL 

OREAD, 

WRITE 



for 



— Write cache with fill data and write data; set TS valid-owned 

Write cache with fill data; set TS valid-owned 

Write cache with fill data; set TS valid 



-No action for a miss- 



No Action 



-No action for a miss- 



Writeback, set 
Bcache 

valid-unowned 



Bcache inval, Writeback, Bcache 
Pcache inval inval, Pcache inval 
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1 3.9.1 Cbox Behavior When the Backup Cache is OFF 

The backup cache may be off for three reasons: the chip has just powered up, the system contains 
no backup cache, or software has disabled the cache by clearing the ENABLE bit in the Cbox 
control register. 

When the cache is off, no accesses to the backup cache are done. Errors are not detected and 
cache state is UNCHANGED unless explicitly changed by software through IPR reads and writes. 

When the backup cache is off, all Ownership-Invalidate cache coherency requests (as the result 
of OREADs or WRITEs) which arrive are forwarded as invalidates to the Mbox, as the data may 
be valid in the Pcache. All reads from the Mbox go directly to the NON_WRITEBACK_QUEUE, 
and an entry in the FILL_CAM is allocated. Fills which return are sent directly to the Mbox 
I without accessing the Bcache, and when the last fill for a block arrives, the FILL_CAM entry is 

cleared. All writes except WRITEJJNLOCKs go directly to the NON_WRITEBACK_QUEUE. 1 

When the cache is off, a DREAD_LOCK/WRITE_UNLOCK pair from the Mbox becomes Hexaword 
Ownership Read/Quadword Disown Write on the NDAL. 

All writes issued from NVAX when it is operating without a backup cache are of quadword 
length. Memory reads are of hexaword length since the Pcache block size is a hexaword. Even if 
the Pcache is off, a hexaword of data is returned to the Mbox. 

A DREADJMODIFY command from the Mbox normally becomes an OREAD on the NDAL when 
it misses in the cache. However, when the cache is off, a normal DREAD is used on the NDAL. 



1 If P%CPU_WB_ONLY_L is asserted, the WRITEJJNLOCK must be allowed to proceed. Only the 
WRITEBACK_QUEUE continues when P%CPU_WB_ONLY_L is asserted, so the WRITE_UNLOCK must go through 
the WRITEBACK_QUEUE. 
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13.9.2 Cbox Behavior When the Backup Cache is in FORCE_HIT Mode 

FORCE JEHT mode is intended to be used for testing purposes only. It is used when the cache is 
enabled. 

When FORCE_HIT is set, all memory space reads and writes to the Bcache, both Istream and 
Dstream, are forced to hit. Tag store state is not changed at all; the data RAMs are accessed as 
if the tag store access produced an owned-valid hit. Cache coherency transactions are treated as 
they are when the cache is off: they are not looked up in the backup cache, they are all forwarded 
to the Mbox, and cache state is not changed as the result of the cache coherency requests. 

When the Bcache is in FORCE_HIT mode, deallocates are not done. Even if the tag matches and 
the VALID and OWNED bits are set, the block is not written back. The implication of this is that 
if FORCE_HIT mode is being used while running in a multiprocessor environment, the Bcache 
must be flushed of all owned blocks beforehand. 

Tag store and data RAM ECC errors are detected in FORCE_HIT mode if DISABLE_ERRORS 
in the CCTL register is not set, resulting in the usual error handling. 

Suppose the ECC logic for the data RAMs is to be tested. Put the cache in FORCEJEHT mode. 
Set SW_ECC in the Cbox control register. Write the desired ECC into BCDECC. Do a Dstream 
write to the desired location, and the location will be written using ECC from BCDECC rather 
than from Cbox-generated ECC. Suppose the ECC written is such that when the data is read, an 
ECC error will be flagged. 

Now perform a read of the location while FORCE_HIT is still set. The read will result in an ECC 
error, showing that the logic is working correctly. The data ram error registers may be read and 
will correspond to the induced error. 

13.9.3 Cbox Behavior When the Backup Cache is in Error Transition Mode 

When the Cbox detects certain errors, as described in Chapter 3 and Section 13.4.2, it puts itself 
into Error Transition Mode. 

The goals of the Cbox design during ETM are the following: 

1. Preserve the state of the cache as much as possible for diagnostic software. 

2. Honor Mbox references which hit owned blocks in the backup cache since this is the only 
source of data in the system. 

3. Respond to NDAL cache coherency requests normally. 

Once the Cbox enters Error Transition Mode, it remains in ETM until software explicitly disables 
or enables the cache, lb ensure cache coherency, the cache must be completely flushed of valid 
blocks before it is re-enabled because some data can become stale while the cache is in ETM. 

Table 13—56 describes how the backup cache behaves while it is in ETM. 
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Table 13-56: Backup cache behavior during ETM 



Cache 
Transaction 



Miss 



-Cache Response- 
Valid hit 



Owned hit 



CPU IREAD,DREAD 
CPU Read Modify 
CPU READ.LOCK 

CPU Write 

CPU 

WRITE UNLOCK 



Read memory 
Read memory 
OREAD memory 

Write memory 

Write memory 



Read memory 
Read memory 
OREAD memory 

Write memory 

Write memory 



Read cache 
Read cache 

Oread memory, Bcache 
dealloc 1 

Write memory, Bcache 
dealloc 1 

Write cache 1 



Fill (from read started 
before ETM) 

Fill (from read started 
during ETM) 

NDAL cache coherency 
request 



-Normal cache behavior- 



-Do not update backup cache; return data to Mbox- 



-Normal cache behavior except that o-inval always goes to Pcache 2 



x Done to preserve write ordering; no invalidate is sent to the Pcache. For the READ_LOCK (or WRITE), the block 
writeback may be done before OR after the OREAD (or WRITE). 

2 The tag store controller looks up the invalidate request normally; if the lookup was an o-inval (due to an OREAD or a 
WRITE on the NDAL), the Cbox arbiter unconditionally forwards an invalidate to the Pcache. If the hit conditions are 
met in the cache, a second invalidate for the same block is forwarded to the Pcache (the tag store controller behaves as 
it does in normal mode.) 



Any reads or writes which do not hit valid-owned during ETM are sent to memory: read data is 
retrieved from memory, and writes are written to memory, bypassing the cache entirely. 

The cache supplies data for Ireads, Dreads, and Dread Modifys which hit valid-owned; this is 
normal cache behavior. 

If a write hits a valid-owned block in the cache, the block is written back to memory and the write 
is also sent to memory. The write leaves the Cbox through the NON_WRITEBACK_QUEUE, 
enforcing write ordering with previous writes which may have missed in the cache. 

If a READJLOCK hits valid-owned in the cache, a writeback of the block is forced and the 
READJLOCK is sent to memory (as an OREAD on the NDAL). This behavior enforces write 
ordering between previous writes which may have missed in the cache and the WRITE_UNLOCK 
which will follow the READJLOCK. 

The write ordering problem to which the previous two paragraphs allude is as follows: Suppose 
the cache is in ETM. Also suppose that under ETM, writes which hit owned in the cache are 
written to the cache while writes which miss are sent to memory. Write A misses in the cache 
and is sent to the non-writeback queue, on its way to memory. Write B hits owned in the cache 
and is written to the cache. A cache coherency request arrives for block B and that block is placed 
in the writeback queue. If Write A has not yet reached the NDAL, Writeback B can pass it since 
the writeback queue has priority over the non- writeback queue. If that happens, the system sees 
write B while it is still reading old data in block A, because write A has not yet reached memory. 
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Referring again to Table 13-56, note that a WRITEJJNLOCK that hits owned during ETM is 
written directly to the cache. There is only one case where a WEITE_UNLO CK will hit owned 
during ETM: if the READ_LOCK which preceded it was performed before the cache entered ETM. 
(Either the READ_LOCK itself or an invalidate performed between the READJLOCK and the 
WRITE_UNLOCK caused the entry into ETM.) In this case, we know that no previous writes are 
in the non-writeback queue because writes are not put into the non-writeback queue when we 
are not in ETM. (There may be I/O space writes in the non-writeback queue but ordering with 
I/O space writes is not a constraint.) Therefore there is not a write ordering problem as in the 
previous paragraph. 

Table 13—56 shows that during ETM, cache coherency requests are treated as they are during 
normal operation. 

Fills as the result of any type of read originated before the cache entered ETM are processed in 
the usual fashion. If the fill is as a result of a write miss, the write data is merged, as usual, 
as the requested fill returns. Fills caused by any type of read originated during ETM are not 
written into the cache or validated in the tag store. 

During ETM, the state of the cache is modified as little as possible. Table 13-57 shows how each 
transaction modifies the state of the cache. 



Table 13-57: Backup cache state changes during ETM 



Cache 
Transaction 



Miss 



-Cache State Modified- 
Validhit 



Owned hit 



CPU IREAD,DREAD, None. 
Read Modify 

CPU READJLOCK None. 



CPU Write 



CPU 

WRITE UNLOCK 



None. 



None. 



None. 



None. 



None. 



None. 



None. 

Clear VALID & 
OWNED; change 
TS_ECC accordingly. 

Clear VALID & 
OWNED; change 
TS_ECC accordingly. 

Write new data, change 
DR_ECC accordingly. 



Fill (from read started 
before ETM) 

Fill (from read started 
during ETM) 

NDAL cache coherency 
request 



-Write new TS_TAG, TSVALID, TS.OWNED, TS_ECC, DR.DATA, DRJECC- 

None. 

Clear VALID & OWNED; change TS_ECC accordingly 



13.9.4 Cbox transition into Error Transition Mode 

When the BIU encounters an error which induces ETM, it sends an explicit transaction to 
the arbiter requesting that the Cbox enter ETM. When the arbiter services this transaction, 
CCTL<HW_ETM> is set. The next transaction serviced by the arbiter will be under ETM. 
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When the backup cache tag store or data RAM controller encounters an ETM-inducing error, it 
sets CCTL<HW_ETM> immediately. 

The arbiter picks up the new value of CCTL<HW_ETM> whenever it starts a new transaction. 
The tag store controller picks up the new value whenever the arbiter instructs it to start a new 
transaction. For a given transaction, the arbiter and the tag store always see the same value of 
ETM. Since they pick up the state of ETM at the beginning of every transaction, the Cbox always 
enters ETM in a predictable way. 

Although the data ram controller may cause the assertion of HW_ETM, it does not use ETM in 
processing its transactions. 

In general, if a transaction starts when the Bcache is operating normally, and it encounters an 
ETM-inducing error, the next transaction is handled in ETM. There is one exception: If a read 
is looked up in the tag store and hits, the data RAM controller looks up the data in the backup 
cache. While the data is being read out of the RAMs, the tag store controller may start a lookup 
for a quadword write. If the quadword write hits, the write WILL be done to the backup cache 
even if the read data encounters an ETM-inducing error before the write is done to the Bcache. 
This sequence would be as follows: 

1. Tag store lookup and Data RAM lookup for READ A start. 

2. Tag store lookup for READ A completes. 

3. Tag store lookup for Quadword Write B starts. 

4. Data RAM lookup for Read A encounters an ETM-inducing error. 

5. Tag store lookup for Quadword Write B completes; it was a hit. 

6. Data RAM lookup for Read A completes. 

7. Data RAM write for Quadword Write B is carried out to the Bcache. 

Quadword Write B completed as if the Bcache were operating normally. If the tag store lookup for 
the Quadword Write had not started until after the ETM-inducing error had been encountered, 
then the Quadword Write would have been carried out under ETM, and the write would have 
been done directly to memory. 



13-106 The Cbox 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



13.9.5 How to turn the Bcache off 

Because the Bcache is a writeback cache, care must be taken to maintain cache coherency when 
turning it off. 

If the cache is running normally and software wishes to turn it off, it must do the following: 

1. Write CCTL to set SW_ETM. In this mode, the Bcache will not allocate any new blocks and 
will send all cache coherency requests to the Mbox as invalidates. 

2. Use the BCFLUSH register to flush all owned blocks out of the cache. 

3. Turn off the Bcache by writing CCTL to clear ENABLE and SW_ETM simultaneously. If an 
error was encountered during the deallocate process, HWJETM may be set and if so, should 
be cleared as well. 

If the Bcache encounters an uncorrectable ECC error, the Cbox sets HW_ETM in the CCTL 
register. If software wishes to turn off the cache, it must do the following: 

1. Use the BCFLUSH register to flush all owned blocks out of the cache. 

2. Write CCTL to clear ENABLE and clear HW_ETM simultaneously. This turns off the Bcache. 

If Bcache errors are happening, but only in part of the cache, software may be able to avoid the 
errored portion of the cache by disabling it through use of the SIZE field in CCTL. If part of the 
cache is failing, a smaller cache size may be selected so that only part of the cache RAMs are 
being used. The cache must be flushed before changing the cache size so that the tags are correct. 

This only works if the smallest cache size is not being used to begin with, and if the failing areas 
of cache do not fall within the range of the smaller cache size selected. 

13.9.6 How to turn the Bcache on 

When NVAX powers up, garbage data is stored in the Bcache tags and data. This would result 
in ECC errors if the cache were turned on immediately. 

Through IPR writes, every Bcache tag store entry must be written with cleared OWNED and 
VALID bits. The value written to the TAG is irrelevant, as long as correct ECC is written to the 
TAG store. 

The Bcache data RAMs must also be initialized with correct ECC on powerup. FORCE_HIT 
mode may be used to initialize the Bcache data RAMs with correct ECC. If full quadword writes 
are used, no data RAM errors will be detected during this process, since the RAMs are written 
without being read first. If partial quadword writes are used, errors will be detected because of 
the read-modify-write which is necessary. If the programmer sets the DISABLE_ERRORS bit in 
the CCTL register, the Cbox will ignore these errors. 

Once the tag store and data RAMS have been initialized, the cache may be enabled by setting 
ENABLE in the CCTL register. 

If the Bcache is in ETM, it may be incoherent with respect to other CPUs and memory because 
of how it treats writes which hit valid but not owned in the cache (see Table 13—56). In addition, 
the Pcache, if enabled, is no longer a subset of the backup cache. The procedure for turning on 
the Pcache and the Bcache described in Chapter 16 must be followed. 

If the Bcache is operating normally and is turned off for some reason, the programmer must 
ensure that when it is reenabled, all the OWNED and VALID bits are cleared. 
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13.9.7 Assertion of P%CPU_WB_ONLY_L 

When P%CPU_WB_ONLY_L is asserted, NVAX may only arbitrate in order to issue Disown 
Writes on the NDAL. When P%CPU_WB_ONLY_L is asserted, the Cbox continues to process 
transactions from the NDAL_IN_QUEUE normally, performing writebacks as necessary. With 
one exception described below, the Cbox arbiter prevents all new reads and writes from the Mbox 
while P%CPU_WB_ONLY_L is asserted. Therefore, if P%CPU_WB_ONLYJL is asserted for 
long periods of time, CPU performance could be adversely impacted. 

The exception to the rule is the following: If a READ_LOCK from the Mbox is in progress when 
P%CPU_WB_ONLY_L is asserted, the WRITEJJNLOCK from the write queue must be allowed 
to complete. Otherwise, deadlock could occur if the system asserted P%CPU_WB_ONLYJL until 
it received data from the WRITE_UNLOCK 

Therefore, while P%CPU_WB_ONLY_L is asserted, the write queue is permitted to continue if a 
READJLOCK i s in progress. The READ_LOCK is completed when either the WRITEJJNLOCK 
is issued and completed, or an "IPR WRITE_UNLOCK" to CEFSTS is issued and completed. 

During the cycle in which P%CPU_WB_ONLY_L is asserted, the Cbox may issue a non-writeback 
command on the NDAL. It is up to the NDAL arbiter not to grant to NVAX again during 
that cycle, so that the Cbox does not issue another non-writeback command in the following 
cycle. If the NDAL arbiter does assert P%CPU_GRANT_L during the same cycle in which 
P%CPU_WB_ONLYJL is asserted, NVAX may drive another non-writeback command on the 
NDAL in the following cycle which was granted. 

There is one interesting error case which can occur when P%CPU_WB_ONLY_L is asserted. It 
is as follows: 

Normally, when the Cbox has a READJLOCK outstanding and it receives an OJNVAL cache 
coherency request (OREAD or WRITE to the block), it sets the OIP bit in the FILL_CAM 
(OJNVAL pending). If the Cbox receives an RJNVAL cache coherency request, it sets the RIP 
bit in the FILL_CAM (RJNVAL pending). When the Ebox issues the WRITEJJNLOCK and the 
Cbox arbiter sees that RIP or OIP is set, it issues a block writeback to the NDAL. This is done 
even if P%CPU_WB_ONLYJL is asserted. 

If some error occurs which prevents the Ebox from issuing the WRITEJJNLOCK, it sends the 
Cbox an "IPR WRITEJJNLOCK" to clear the READ_LOCK out of the FILLJCAM. This "IPR 
WRITEJJNLOCK" clears the FILL_CAM entry but the Cbox arbiter DOES NOT check the status 
of RIP and OIP to see if we need to do a writeback. 

The implication is that if the Cbox is in the middle of a READJLOCK-WRITEJJNLOCK 
and a cache coherency transaction arrives for the block, AND the Ebox never issues the 
WRITEJJNLOCK due to some error (see below), the Cbox will NOT write back that block in 
response to the former invalidate. (The Cbox would write the block back if a subsequent cache 
coherency request arrived.) The following error would cause this situation: TB parity error after 
issuing the read lock; Ebox S3 stall timeout after issuing the read lock; an uncorrectable error in 
the Backup cache data RAMs on the first quadword of the read lock. 

This could cause a deadlock in a system if the system had asserted P%CPU_WB_ONLY_L 
because it was waiting for the writeback. NVAX might never issue the writeback and the Cbox 
stops processing after the "IPR write unlock", until P%CPU_WB_ONLY_L is deasserted. 
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One solution to the deadlock is for the system element which is waiting for the writeback 
to have a timeout counter, so that it does not wait forever. Once the element times out, 
P%CPU_WB_ONIiY_L should be deasserted and the system can continue to operate. Or if 
the cache coherency transaction is reissued on the NDAL after the completion of the "IPR 
WRITEJJNLOCK", the Cbox WILL service it. 
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13.9.8 Backup Cache Errors 

In general, the Cbox logs as much state as possible concerning errors and notifies the Ebox and/or 
Mbox that an error has occurred. For every error, the Cbox asserts either C%CBOSLH_ERR_H or 
C%CBOXJS_ERR_H to notify the interrupt section of a hard error or a soft error, respectively. The 
Cbox also notifies the Mbox if the error occurred on a fill to the Mbox. 

The backup cache goes into Error Transition Mode when it detects any uncorrectable error from 
the cache RAMs. 



Table 1 3-58: Backup Cache ECC Errors and NVAX CPU Error Responses 

General Problem Specific Situation and Action Taken by NVAX CPU 

Correctable 
ECC error in the 
data RAMs 



Correctable 

ECC error in the tag 

store 



read hit for writeback 
or read hit for 
deallocate IPR 

read hit for Mbox 



read for write hit 



miss 



any 

read or write except 
WUNLOCK (hit or 
miss) 



WRITEJJNLOCK 

cache coherence 
transaction miss 

cache coherence 
transaction hit 



Cbox asserts c%cbox_s_kkr_h. The data for the writeback 
is corrected and the writeback continues normally. 

Cbox asserts c«cbox_s.kriub. c*cbox_ecc_erh_h is asserted 
to tell the Mbox to ignore the uncorrected data. When 
the data has been corrected, it is driven to the Mbox. 
Hardware does not correct the error in the cache. 

Cbox asserts c%cbox_s_£rr_h. The corrected data is merged 
with the write data and written into the RAMs. 

No error is reported. 

Cbox asserts c%cbox_s_krr_h, assumes the transaction 
missed, and sends a READ or an OREAD to memory. If 
the location was owned, making a deallocate necessary, 
the outgoing address is corrected for the writeback. Note 
that if the transaction actually hit-owned, the read or 
oread is sent to the NDAL followed by a writeback of the 
same block. The errored location is corrected by hardware 
when the tag and valid bit are written for the fill. 

No tag store lookup is done, so this case does not occur. 

Cbox asserts c%cbox_s_krr_h. Hardware does not correct 
the bad location; it may be done by software. 

Cbox asserts c%cbox_s w err_h. Writes the corrected tag, 
valid, and owned bits back into the tag store when 
invalidating the entry. Uses corrected address for the 
writeback if necessary. 
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Table 13-58 (Cont): Backup Cache ECC Errors and NVAX CPU Error Responses 

General Problem Specific Situation and Action Taken by NVAX CPU 



Uncorrectable ECC 
error in the data 
RAMs (includes 
addressing errors) 



Uncorrectable ECC 
error in the 
tag store (includes 
addressing errors) 



read for writeback or 
deallocate IPR 

VALID-OWNED or 
VALID-UNOWNED 
read for Mbox 

VALID-OWNED 
DREAD.LOCK for 
Mbox, first quadword 
fails 



VALID-OWNED 
DREADJLOCK 
for Mbox, quadword 
other than the first 
one fails 

read for 
write or write- unlock, 
valid-owned hit 



miss 



read for Mbox 



write 



WRITEJJNLOCK 

cache coherence 
transaction 



Cbox asserts c%cbox_s_err_h, puts backup cache into ETM. 
The data cycle command for the NDAL is changed to 
BADWDATA and the writeback continues normally. 

Cbox asserts c%cbox_s_kh»jh, puts backup cache into 
ETM. The CM_OUT_LATCH is loaded with the data and 
marked bad by asserting c%cbox_haied_err_h. 

Cbox asserts c%cbox_sjbkrjh, puts backup cache into 
ETM The CM_OUT_LATCH is loaded with the data 
and marked bad by asserting c%cbox_habd_err_h. The 
DREAD_LOCK entry remains in the FILL.CAM until 
microcode issues the "IPR write unlock". If RIP or OIP is 
set, it is not processed. 

Cbox asserts c%cbox_sjerrjb, puts backup cache into 
ETM The CM_OUT_LATCH is loaded with the data 
and marked bad by asserting c%cbox_hakd_krr_h. The 
Ebox/Mbox issues the WRITEJJNLOCK since data for 
the DREAD.LOCK was returned. 

Cbox asserts c%cbox_hjerb^h, puts backup cache into 
ETM When the error is detected, write data has already 
been merged with the corrupted data. The Cbox inverts 
two of the ECC check bits (bits 3,7) which gives a 
high probability that when the data is read again, an 
uncorrectable error will be detected. See description after 
this table. 

No error is reported. 

Cbox asserts c*cbox_s_kbr_h, puts backup cache into ETM. 
The read is sent to memory; if the backup cache actually 
owned the block the read will time out. If fill data is 
returned, the fill is done to the Bcache and the fill data 
is sent to the Mbox. 

Cbox asserts c%cbox_s_krr_h, puts backup cache into ETM. 
The Oread for the write is sent to memory. If the cache 
actually owned the block, the read will time out and the 
write will then be sent to memory. The write will then 
time out as well unless error handling software cleans up 
the problem. If the cache did not own the block, the Oread 
will complete, the write will be merged with it, and the 
merged data will be written to the cache. 

No tag store lookup is done, so this case does not occur. 

Cbox asserts c*cbox_s_kbb_h, puts backup cache into ETM. 
Transaction is treated as a miss with regard to the backup 
cache; the invalidate is forwarded to the Mbox if the cache 
coherence transaction was due to an OREAD or a WRITE. 
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One action noted in the table deserves further explanation. When an uncorrectable ECC error is 
detected in the data RAMs during a read-modify-write, the Bcache controller has already begun 
to write the new data into the cache, overwriting the errored data. The new data may have been 
corrupted by the errored data which was read from the cache. If this were allowed to be written 
into the cache with correct ECC, it might be read back later with no errors and incorrect data 
would be returned to the CPU. 

In order to prevent this from occurring, the Bcache controller inverts two of the checkbits which 
are being written to the cache to deliberately cause errored data to be written. This increases 
the likelihood that when the data is read back, an uncorrectable error will be detected whether 
the data is read back as written or with single-bit or multiple-bit errors. 

Due to layout constraints, only checkbits 3,6, and 7 were potential candidates to be inverted in 
the circumstance described. The probabilities for reading the data back as uncorrectable are 
shown in Table 13-59. 



Table 13-59: Probability of reading data with an uncorrectable error after writing it with 
inverted checkbits 



Bits 

Inverted 


no error 
read back 


single bit 

error read 
back 


double bit 

error read 
back 


triple single 

nibble error 
read back 


quad single 

nibble 
error 

read back 


3,6 


1.00 


.3425 


.9909 


.4306 


.6111 


3,7 


1.00 


.3973 


.9916 


.4861 


.6667 


6,7 


1.00 


.1233 


.9878 


1.0000 


1.0000 


3, 6,7 


0.00 


.9863 


.4429 


1.0000 


1.0000 



Choosing bits 3 and 7 results in uncorrectable errors a high percentage of the time if you assume 
a high likelihood that the data will be read back with no error (as it would be if the original 
error were transient) or with a double-bit error (as it would be if the original error were a hard 
double-bit error). 
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13.9.9 Backup Cache Errors Incurred While in Error Transition Mode 

Table 13-60 describes error handling when the backup cache is already in ETM. 

NOTE 

The table below only describes ETM error cases which differ from error handling when 
the cache is in normal mode. 



Table 13-60: Backup Cache ECC Error handling during ETM 



General Problem Specific Situation and Action Taken by NVAX CPU 

Correctable 
ECC error in the tag 
store 



WRITE UNLOCK 



Uncorrectable ECC 
error in the 
tag store (includes 
addressing errors) 



read for Mbox 



write 



WRITE UNLOCK 



c%CBOx_s_KRHja. The error is corrected 
and the WRITEJJNLOCK is handled as it normally is 
in ETM: it is written to the Bcache if it hits owned, and 
it is written to memory if it misses or hits valid. 

Cbox asserts c%cbox_s_err_h, puts backup cache into ETM. 
The read is sent to memory; if the backup cache actually 
owned the block the read will time out. If nil data is 
returned, the fill is not done to the Bcache but is sent to 
the Mbox. 

c%cbox_s_jkilh. The write is sent to memory. If the 
cache actually owned the block, the write will time out 
in the memory interface unless software forces the Cbox 
to disown the block. If the cache did not own the block, 
the system handles the write as it normally does for a 
cache which is off 

c%cbox_s_krk_h- The write is sent to memory as a 
QW WDISOWN. Since the READ_LOCK was done just 
previously, memory always believes that we own the 
block. In most cases, the cache itself does not have 
a record of owning the block since a READ_LOCK to 
an owned block during ETM forces a writeback of the 
block. In these cases the WRITEJJNLOCK handling 
is very consistent. There is only one case where the 
cache does own the block: if we entered ETM on or after 
the READJLOCK and before the WRITE_UNLOCK In 
this case, the cache may contain previously written data 
which is not now reflected into memory. This may be 
handled by software 



1 3.9.1 0 NDAL Parity Errors 

The Cbox response to NDAL parity errors is described in Chapter 3. 
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13.10 Testability 

The testability features provided in the Cbox make key Cbox control visible for debug purposes. 
The testability features do not specifically address fault coverage for manufacturing, since Cbox 
activity is very visible on the NDAL and cache interface pins. 

Many of the Cbox IPRs should be useful for testing and debug. The IPRs are described in 
Section 13.5. This section describes additional Cbox testability features. 

13.10.1 Parallel port 

The parallel port is useful for real-time debugging and for manufacturing test. The Cbox does 
not control any nodes using the parallel port; it is used for observation only. C%PP_DATAja<ll:7> 
are driven as shown in Table 13-61. The Mbox contains the circuitry which enables 
C%PP_DATAJH<ll:7> to drive the parallel port when T%MBOSLDR_PP_H is asserted. 

Table 13-61: Cbox Parallel Port Connections 

Parallel port 

signal Cbox signal Cbox Signal Meaning 



c%pp_daxa^h<ii> BC_TS_CMD<2> Given in Table 13-62 
c%pp_pata_h < io> BC_TS_CMD<1> Given in Table 13-62 
c*pp jdata_h< 9> BC_TS_CMD<0> Given in Table 13-62 



C%PP_DATA_H< 8> 


DEALLOC 


Asserted when the tag store starts a deallocate. 


C%PP_DAXA_H< 7> 


BCHIT 


Backup cache hit; factors in the type of request with VALID, 
OWNED, and the result of the tag compare. 


BC_TS_CMD<2:0> is decoded as follows: 


Table 13-62: 


Interpretation of BC_TS_CMD<2:0> 


BC_TS_CMD 


Name 


Tag store operation 


000 


DREAD 


Data-stream tag lookup 


001 


IREAD 


Instruction-stream tag lookup 


010 


OREAD 


Ownership-read tag lookup for a write or a READJLOCK 


011 


WUNLOCK 


Ownership-read tag lookup for a WRITE_UNLOCK (done only under 
ETM) 


100 


RJNVAL 


Cache coherency tag lookup as the result of NDAL DREAD or IREAD 


101 


OJNVAL 


Cache coherency tag lookup as the result of NDAL OREAD or write 


110 


IPR_DEALLOC 


Tag lookup for an explicit IPR deallocate operation 


111 


unused 
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13.10.2 Internal scan chain 

A scan chain is provided on both entries of the FILL_CAM. A Linear Feedback Shift Register is 
provided on this scan chain. This serves two purposes: it helps the debug effort and it increases 
fault coverage in manufacturing. The scan chain bits are loaded when M%C_ISR_LOAD_L is 
asserted; they are shifted out when it is deasserted. The LFSR is enabled when M%C_ISR_LFSR_L 
is asserted. When M%CJSR_LFSR_L is not asserted, the scan chain becomes an observe-only 
register. 

The FILL_CAM gives cycle-by-cycle information on what is happening in the Cbox, as every 
potential cache miss is loaded into the FILL_CAM before the miss actually occurs. There is 
information relating to cache coherency requests as well. 

The Cbox scan chain covers the following bits of the FILL_CAM: 



Table 13-63: FILL CAM scan chain 



Name 


Extent 


Type 


Description 


RDLK.0 


0 


WC 


Indicates that the outstanding read is a READ_LOCK 


IREAD_0 


1 


RO 


This is an I stream read from the Mbox which may be aborted. 


OREADO 


2 


RO 


This is an outstanding OREAD. 


WRITEO 


3 


RO 


This read was done for a write. 


TO_MBOX_0 


4 


RO 


Data is to be returned to the Mbox. 


RIP.O 


5 


RO 


READ invalidate pending. 


OIP0 


6 


RO 


OREAD invalidate pending. 


DNFO 


7 


RO 


Do not fill - data not to be written into the cache or validated 
when the fill returns. 


RDLK_FL_DONE_0 


8 


RO 


Indicates that the last fill for a READJLOCK arrived. 


REQ_FILL_DONE_0 


9 


RO 


Indicates that the requested quadword was successfully received. 


COUNT_0 


11:10 


RO 


How many of the fill quadwords have been returned successfully. 


VALID.O 


12 


WC 


Indicates that an error occurred and the register is locked. 


RDLK.l 


13 


WC 


Indicates that the outstanding read is a READ_LOCK 


IREAD.l 


14 


RO 


This is an Istream read from the Mbox which may be aborted. 


OREAD_l 


15 


RO 


This is an outstanding OREAD. 


WRITE_1 


16 


RO 


This read was done for a write. 


TO_MBOX_l 


17 


RO 


Data is to be returned to the Mbox. 


RIP_1 


18 


RO 


READ invalidate pending. 


OIP_l 


19 


RO 


OREAD invalidate pending. 


DNF.l 


20 


RO 


Do not fill - data not to be written into the cache or validated 
when the fill returns. 


RDLK_FL_DONE_l 


21 


RO 


Indicates that the last fill for a READJLOCK arrived. 


REQ_FILL_DONE_l 


22 


RO 


Indicates that the requested quadword was successfully received. 


COUNT_l 


24:23 


RO 


How many of the fill quadwords have been returned successfully. 
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Table 13-63 (Cont.): FILL_CAM scan chain 



Name 


Extent Type 


Description 


VALID.1 


25 WC 


Indicates that an error occurred and the register is locked. 



There are two FILL_CAM entries. Thirteen bits in each are covered, for a total of 26 bits in this 
scan path. 

The Cbox scan chain is connected in the order shown in the table, with bit <0> shifted out first 
and sent to the Mbox scan chain. When the Cbox scan chain is in shift mode, a "0" is shifted 
into bit <25> of the Cbox scan chain. Bit <0> is driven onto C%BSR2_TDO_H, which is input to the 
Mbox scan chain. 
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13.11 Performance Monitoring 

The Cbox sends two signals, C%PMUXO_H and C%PMUXl_H to the 
performance counters. CCTL<PM_ACCESS_TYPE> controls the mux which outputs C%PMUXO_H. 
CCTL<PM_fflT_TYPE> controls the mux which outputs C%PMUX1_H. 

The correspondence between CCTL<PM_ACCESS_TYPE> and C%PMUXO_H is shown in 
Table 13-64. 



Table 13-64: Cbox Performance Monitoring Control 



CCTL: 


Signal muxed 




PM_ACCESS_TYPE<13:11> 


onto C%PMUX0_H 


Signal functionality 


000 


BC_COH 


Bcache coherency access (as a result of an NDAL 
DREAD, IREAD, OREAD, or WRITE) 


001 


BC_COH_READ 


Bcache coherency access as a result of an NDAL 
DREAD or IREAD 


010 


BC_COH_OREAD 


Bcache coherency access as a result of an NDAL 
OREAD or WRITE 


011 


unused 




100 


BC_CPU 


Bcache CPU access (as a result of an NVAX Iread, 
Dread, or Oread) 


101 


BC_CPU_IREAD 


Bcache CPU access as a result of an NVAX Iread 


110 


BC_CPU_DREAD 


Bcache CPU access as a result of an NVAX Dread 
or Dread-modify 


111 


BC_CPU_OREAD 


Bcache CPU access as a result of an NVAX Oread 
due to a read lock, a write, or a write unlock. 


The correspondence between CCTL<PM_HIT_TYPE> and C%PMUX1_H is shown in Table 13-65. 


Table 13-65: Cbox Performance Monitoring Control 


CCTL: 


Signal muxed 




PM_HIT_TYPE<15:14> 


onto c%pmuxi_h 


Signal functionality 


00 


BC_HIT 


Bcache hit; factors in VALID and OWNED as 
necessary, based on the transaction. 


01 


BC_HIT_OWNED 


Bcache hit owned; tag matched, VALID and 
OWNED were set. 


10 


BC_HIT_VALID 


Bcache hit valid; tag matched, VALID was set, 
OWNED was either set or clear. 


11 


BC_MISS_OWNED 


Bcache miss; tag did not match, VALED and 
OWNED were set (triggers writeback). 



The HIT signals which produce C%PMUX1_H are valid during the same cycle in which the ACCESS 
signals which produce C%PMUX0_H are asserted. They must be valid at the same time because in 
the central performance monitoring hardware, C%PMUXl_H is conditioned with C%PMUX0JB. 
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13.12 Initialization 

When the CPU powers up, K_C%RESET_L and K%RESET_CCTT^_L are asserted, clearing the main 
queues and latches in the Cbox and putting the Cbox state machines into their idle states. The 
only Cbox IPR which is initialized on reset is the Cbox control register, CCTL. It is initialized as 
described in Section 13.5.1. 

ELC%RESET_L is also asserted when the Ebox timeout counter expires. At this time 
K%RESET_CCTL_L is not asserted. Thus, the Cbox is initialized just as on power-up except that 
CCTL is not changed. 

K_C%RESET_L must be asserted for 18 internal cycles (6 NDAL cycles) in order to properly reset 
the Cbox. 

The backup cache must be initialized and turned on as described in Section 13.9.6. Software 
must write CCTL to the desired state. The W1C error registers should be cleared so that they 
are starting with no error bits set. 

When the CPU powers up, K%EXT_RESET_L is asserted which puts the pads into their reset state: 

• Tristates P%NDAL_H<63K)>, P%CMD_H<3:0>, P%ED_H<2:0>, and P%PARITY_H<2:0> . 

This occurs when internal reset is asserted, and is not qualified with any clock. 

• Releases P%ACK_L. is occurs when internal reset is asserted, and is not qualified with any 
clock. 

• Deasserts P%CPU_REQ_L, P%CPU_HOLD_L, and P%CPU_SUPPRESS_L. This occurs 
when K%EXT_RESET_L is asserted, and is not qualified with any clock. 

• Deasserts P%TS_OE_L, P%TS_WE_L, P%DR_OE_L, and P%DR_WE_L. This occurs when 
K%EXT_RESET_L is asserted, and is not qualified with any clock. 

• Tristates P%TS_TAG_H<31:17>,P%TS_ECC_H<5K)>,P%TS_VALID_H, 
P%TS_OWNED_H, P%DRJDATA_H<63:0>, and P%DR_ECC_H<7:0>. This occurs when 
K%EXT_RESET_L is asserted, and is not qualified with any clock. 
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13.13 Cbox I nterf aces 

The Cbox interfaces with the Mbox, the NDAL, the backup cache, the Interrupt section, and the 
Clock section. The signals the Cbox uses for each of these interfaces are listed here. 



Table 13-66: CBOX interface signals 



Signal 


Number 


I/O 


Description 


NDAL SIGNALS (80 total) 


P%CPU_REQ_L 


1 


o 


Requests the NDAL. 


P%CPU_HOLD_L 


1 


o 


Holds the NDAL. 


P%CPU_GRANT_L 


1 


I 


Grants NVAX the NDAL. 


P%CPU_SUPPRESS_L 


1 


o 


Suppresses the NDAL. 


P%CPU_WB_ONLY_L 


1 


I 


Suppresses non-writeback NVAX transactions. 


P%NDAL_H<63:0> 


64 


I/O 


NDAL address/data, multiplexed lines. 


P%CMD_H<3:0> 


4 


I/O 


NDAL command. 


P%ID_H<2.-0> 


3 


I/O 


Identifies the NDAL driver. 


P%PARITY_H<2K)> 


3 


I/O 


Parity on the NDAL. 


P%ACK_L 


1 


I/O 


Acknowledges NDAL cycles as correctly received. 


BACKUP CACHE TAG STORE SIGNALS (41 total) 


P%TS_INDEX_H<20:5> 


16 


o 


Index into the tag store. 


P%TS_OE_L 


1 


o 


Tag Store Output Enable. 


P%TS_WE_L 


1 


o 


Tag Store Write Enable. 


P%TS_TAG_H<31:17> 


15 


I/O 


Backup cache tag. 


P%TS_ECCH<5:0> 


6 


I/O 


Tag store ECC. 


P%TS_OWNED_H 


1 


I/O 


Indicates ownership of the block. 


P%TS_VALID_H 


1 


I/O 


Indicates the block is valid. 


BACKUP CACHE DATA RAM SIGNALS (92 total) 


P%DR_INDEXH<20:3> 


18 


o 


Index into the data rams. 


P%DR_OE_L 


1 


o 


Data RAM output enable. 


P%DR_WE_L 


1 


o 


Data RAM write enable. 


P%DR_DATA_H<63:0> 


64 


I/O 


Backup cache data. 


P%DR_ECC_H<7.-0> 


8 


I/O 


Backup cache data ECC. 
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Table 13-66 (Cont.): CBOX Interface signals 



Signal 


Number 


I/O 


Description 


CLOCK PINS (4 total) 


P%PHI12_IN_H 


1 


I 


NDAL clock used in the pads. 


P%PHI23_IN_H 


1 


I 


NDAL clock used in the pads. 


P%PHI34_IN_H 


1 


I 


NDAL clock used in the pads. 


P%Pffl41_IN_H 


1 


I 


NDAL clock used in the pads. 


CLOCK SECTION INTERFACE (5 total) 


KJKCB%PHI_1_H 


1 




Clock used in the Cbox. 


!T_Mf!TMtiPTTT_g H 


1 




Clock used in the Cbox. 


K_MCB%PHI_3_H 


1 




Clock used in the Cbox. 


K_MCB%PHI_4_H 


1 




Clock used in the Cbox. 


K_PAD%PHI_1 W H 


1 




Clock used in the upper pad ring. 


KJPAD%PHI_3_H 


1 




Clock used in the upper pad ring. 


K^PADWRHLAJB 


1 




Clock used in the upper pad ring. 


KJPADLWHL.1JS 


1 




Clock used in the lower pad ring. 


K_PADL%PBDL?_H 


1 




Clock used in the lower pad ring. 


K^PADL%PHI_3_H 


1 




Clock used in the lower pad ring. 


KJPADL%PHI_4_H 


1 




Clock used in the lower pad ring. 


K%EXT_KESET_L 


1 




Puts the cache and NDAL pads into their reset 
state. 


BLC%BKSKT_L 


1 




Resets the Cbox except for CCTL. 


K%RESET_CCTL_L 


1 




Resets the Cbox control register, CCTL. 


K W CE%RESET_H 


1 




Resets the BIU cycle counter which relates internal 
to external time. 


EBOX INTERFACE SIGNALS (2 total) 


C%CBOXJEIJKRR_H 


1 


o 


Indicates a hard error in the backup cache or on the 
NDAL. 


C%CBOX 1 _S_KRR_H 


- 1 


o 


Indicates a soft error in the backup cache or on the 
NDAL. 


E%TIMKOUT_BASK_H 


1 


I 


Controls the NDAL read timeout counters. 


E%TIMEOUTJ5NABLJ5L.H 


1 


I 


Controls the NDAL read timeout counters. 
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Table 13-66 (Cont.): CBOX interface signals 

Signal Number I/O Description 

TEST AND PERFORMANCE MONITORING SIGNALS (10 total) 



C%PP_DATAJB<lli7> 5 O 

C%ISR2_TDO_H 1 O 

M%C_ISR_LOAD_L 1 I 

M%C_ISR_LFSR_L 1 I 

T_JTC»fcDRCLK.L 1 I 

T_JTG%DRCLK_H 1 I 

T_JTG%CAPTUKK_L 1 I 

T_JTG%BSR_KXTKST_L 1 I 

T_JTG%BSR_UPDATK_L 1 I 

C_PAD _N%BSR_NDAL_H<83> 1 O 

E_PAD_INT%BSR JWACHINK_CHBCK_L 1 I 

BLPAD_CK2%DISABLK_OtJT_H 1 I 

C%PMUX0_H 1 O 

C%PMUS1_H 1 O 



Cbox internal state, driven to the Mbox, where it is 
driven to the parallel port when selected. 

Cbox internal scan chain output which hooks up to 
the Mbox scan chain. 

Tells the Cbox LFSR/internal scan chain whether 
to load or shift. 

Puts the Cbox LFSR/internal scan chain into Linear 
Feedback Shift Register mode. 

Clocks the boundary scan cells. 

Clocks the boundary scan cells. 

When asserted, the boundary scan cells are in load 
mode; otherwise, they are in shift mode. 

When asserted, the pins are driven with data from 
the boundary scan cells rather than with NVAX 
internal data. 

Controls the update of the cache I/O pads, when 
driven by JTAG. 

Boundary scan chain output from the Cbox pads. 

Boundary scan chain input from the Ebox pads. 

Asynchronously disables all NVAX outputs 

from driving; equivalent to the inversion of 

P%DISABLE_OUT_L. 

Cbox performance monitoring output. 

Cbox performance monitoring output. 
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Table 13-66 (Cont.): 


CBOX interface signals 




Signal 


Number 


I/O 


Description 


MBOX INTERFACE SIGNALS (157 total) 


M%S6J3MD_H«4>0> 


5 


I 


Mbox reference command field. 


M%S8_PA W H<31«3> 


29 


I 


Physical address of Mbox reference. 


M%C_S«_PA w H<2jO> 


3 


I 


Physical address of Mbox reference, lower three 
bits. 


M%S6_BYTE_MASKJH<7iO> 


8 


I 


Byte enable field of Mbox reference. 


M%CBOX u REFJBNABLE_L 


1 


I 


Indicates that the current S6 reference packet 
should be latched and processed by the Cbox; not 
asserted for writes as all writes are processed by 
the Cbox. 


M%CBOX w LA3K_KN w H 


1 


I 


This is equivalent to m%cbox_rkf_knable_l , but 
driven to the Cbox with later timing, after the Mbox 
detects a Pcache parity error. It indicates that 
the S6 reference packet should be processed by the 
Cbox. 


M%ABORT_CBOXJHD_H 


1 


I 


Indicates that any IREAD which the Cbox may be 
processing should be immediately terminated. 


M%CBOX UJ BYPASS_ENABLJS_H 


1 


I 


Indicates that the Cbox may drive b%s«_daxa w h<83io> 
during the following cycle in order to attempt a fill 
data bypass. 




64 


I/O 


Bus used to receive data from the Mbox and to send 
data to the Mbox. 


C%S6 DP H<7i0> 


8 


0 


Byte data parity for b%s«j)aia^h<63j0>. 


C%CBOX_CMD_H<l iO> 


2 


o 


Command field of Cbox reference sent to Mbox 


C%CBOX u ADDR_H<31>5> 


27 


o 


Hexaword address for invalidate sent to Mbox 


C*>MBOX^FII£ u O.Wja<4i3> 


2 


0 


Address bits to indicate to which quadword within 
the hexaword the current fill data belongs. 


C%RKQ_DQWJH 


1 


0 


Indicates that the requested quadword of data is 
being returned. This is asserted for both DREADs 
and IREADs; it is also asserted if a hard error 
occurs on fill data and the requested quadword has 
not yet been returned. 


C%LAST_FTLL_H 


1 


o 


Indicates that this is the last fill sent for the read 
being processed 


C%CBOX^HARD_EKHJB 


1 


o 


Indicates that a hard error is associated with the 
data being returned. The Mbox treats this as a fill 
with an error. 


CSWaM>XJSCC_ER*_H 


1 


0 


Indicates that an ECC error is associated with the 
data being returned. The Mbox ignores the data 
and waits for another fill from the Cbox. 


CTfcWR JBUFJBACK^PKESJB 


1 


o 


Indicates that the Cbox cannot accept any more 
entries in its WRITE_QUEUE. 
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13.14 Resolved Issues 

1. Issue: Does the Cbox need to check for conflicts between writes into the Istream and IREADs? 
Resolution: Yes, it does. The following case illustrates why. 

Suppose that the Cbox did not check for conflicts between writes and Istream reads. Also 
note that the SRM requires that an REI be done after any write into the instruction stream. 
REI flushes all write buffers and flushes the VIC. 

Suppose that the Ibox is prefetching and issues IREAD A, IREAD A+l. A and A+l are 
adjacent hexawords. Around the same time the Ebox is doing unaligned WRITE A,A+1,REI 
which was caused by Istream previous to that now being fetched by the Ibox. 

Suppose the sequence as seen by the Mbox is IREAD A, unaligned WRITE A r Af 1, IREAD 
A+l, REI. The first IREAD is prefetching Istream data and should retrieve the new A. If 
the first IREAD misses in the VIC and the Pcache, the Bcache will return old data for the 
IREAD. The write will then be done into the Pcache, since it is write-through, and into the 
WRITE_QUEUE. At this point the new data for A is in the Pcache. 

Now the second IREAD misses the Pcache and appears in the IREADJLATCH in the Cbox. 
It is serviced before the write since no conflict checking is done for IREADs, and they take 
priority over writes. Old data is returned to the Pcache for the second IREAD. Then the 
Clear Write Buffer command appears in the Cbox because the Ebox is executing the REI so 
the write is done. 

At this point the VIC has old data for the IREADs. This is ok because the REI flushes the 
VIC. Location A is updated in the Pcache because the write was done after the first IREAD. 
However, the Pcache has old data for A+l because the Bcache returned the old data after the 
write missed into the Pcache. 

When the Istream re-fetches A+l, it will get old data from the Pcache. This is not the behavior 
we want. Thus, the Cbox implements conflict checking for IREADs and prevents the IREAD 
of A+l from bypassing the write to A+l. 

2. Issue: Is it ok that the Cbox reorders I/O space writes with respect to memory space writes? 
Resolution: Yes, it is OK per VAX ECO 95, Allow Write-and-Run to I/O space. 

This is the scenario where the Cbox may reorder I/O space writes with respect to memory 
writes: The Mbox issues Memory Write A followed by I/O Write B. Memory Write A hits owned 
in the backup cache and is written. I/O Write B goes to the NON_WRITEBACK_QUEUE. 

The NDAL is busy or P%CPU_WB_ONLY_L is asserted, so I/O Write B stays 
in the NON_WRITEBACK_QUEUE. Meantime, a cache coherency request arrives for 
memory location A. The data is retrieved from the backup cache and put into the 
WRITEBACK.QUEUE. 

Since 

the WRITEBACK.QUEUE contains a cache coherency request (or P%CPU_WB_ONLY_L 
is asserted), the WRITEBACK.QUEUE has priority over the NON_WRITEB ACK_QUEUE . 
Therefore Memory Data A reaches the NDAL before I/O Write B, effectively reordering the 
writes. 
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13.15 NVAX CBOX Signal Name Cross-Reference 

All CBOX signal names and pin names referenced in this chapter have appeared in bold and reflect 
the actual name appearing in the NVAX schematic set, with the exception of K%EXT_RESET_L, 
which is a behavioral model name only. For each signal and pin appearing in this chapter, the 
table below lists the corresponding name which exists in the behavioral model. 



Table 13-67: Cross-reference of all names appearing in the CBOX chapter 



Schematic Name 



Behavioral Model Name 



B%S6BATA_H<B3«0> 

OKJBOX_ADDRH<31.5> 

C%CBOX_CMD_H<l iO> 

C%CBOXJKCC_KRR_H 

C%CBOX_HABD_EBB_H 

C%CBOX^HJ£RR_H 

C!%CBOXJ5JEBBJS 

C%ISR?_TT>0 _H 

C%LAST_FILL_H 

C%MBOX^KDX_QW_H<4^> 

C%PMUXD_H 

CftPMGXIJB 

C%PPJ>AT\.H<11«7> 

C%REQ_DQW_H 

C%£»_DP_H<7iO> 

C%WR_BUF_BACK^PKES_H 

C_ADC%ABUS_H<31:0> 

C _ADC%BIU_ADDR_OUT_P<31iO> 

C_BIU%ABCJU>DB_IN_H<31iO> 

C_BIU%CTCIJ3_1_H 

C_BIU%CYCLE_2_H 

C_BIU%CYCLK_3 JBt 

C_BIU_NOC«*B^T1MO_0 JLAT _H 

C_BTO_NOC%BXIjnMO_l _LAT_H 

C_BrC_NOC_5«BX^_TIMO_0JEN_P 

C_BIU - NOC_5«BXLTIMO_lJ5N_H 

C_BUS^rUJDAIAJH<6SiO> 

C_BUS<*>DBUS_H<63jO> 

C_PAD_N%BSB_NDAL_H<63> 

E%TIMKOUTJBASK_H 



B%S6_DATAja<63iO> 

C%CBOX u ADDB^H<31tS> 

C%CBOX_CflO)JB<liO> 

C%CBOX^KCC_KRR_H 

C%CBOX.HABD_EBB._H 

C%CBOX_HJEBB_H 

C%CBOX_S_ERR_H 

C%ISR2_TDO_H 

C%LAST_FnX_H 

C%UBOS_m2_.QW_H<4t3> 

C%PMOX0 _h 

C%PMUX1 _H 

C%PP_DAXA_H<11»7> 

C%BEO_DQW_H 

C%S6_DP_H<7iO> 

C%WR_BUF _BACBLPRKS_H 

C_BUS%ABUS_H<31jO> 

C_ADCWfcBro_ADDB_OlJT_H<31iO> 

C_BTO%ABC_ADDB_n*_H<31tO> 

C_BTU%CYCLK_1_H 

C_BIO%CTCIJS_2_H 

C_BIU%CYCLE_3_H 

C_BIU _NOC%BXI_TIMO_0_LAT_H 

C_Bro_NOC%BXI_TIMO_l_LAT _H 

C_BTO_NOC%BXI_TIMO_0JBN_H 

C_BIU_NOC < 5fcBXI_TIMO_l_KN_H 

C_BUS*BIU_DATA_H<6SiO> 

C_BUS%DBUS_H<63«> 

T J3SR%NDAL_HI_H<83> 

E%TDfEOm_BASBja 
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Table 13-67 (Cont): Cross-reference of all names appearing in the CBOX chapter 



Schematic Name 



Behavioral Model Name 



E%TTMEOUT_ENAKLE_H 

E_PAD_INT%BSR_MACHINK_CHKCBLL 

K%KXT_TMB S_H 

K%RKSKT_CCTL_L 

K_C%RESET_L 

K_CE%RKSET_H 

K_MCB%PHI_1_H 

K W MCB%PHI_2_H 

BLMCB%PHI_3_H 

K_MCB%PEQ_4_H 

K_PAD%PHI_1_H 

K_PAD%PHI_3_H 

K^PAD%PHI_4_H 

K_PAOL%PHI_l_H 

K_PADL%PHI_2_H 

K_PADL%PHI_3_H 

BLPADL%PHI_4_H 

K_PAD_CK2%DISABLE_OUT_H 

K_PAD%KXT_RKSET_TOP_L 

BLPAD%KXT_RKSKT_BOT_L 

M%ABORT_CBOX_IRD_H 

M%CBOX_BYPASS_KNABLK_H 

M«S>C BOX w LATK_KN_H 

M%CBOX_REF_ENABLE_L 

M%C_ISR_LFSR_L 

M%C_ISR_LOAD_L 

M%C_S6_PA_H<2K)> 

M%S6_BYTK_MASK_H<7tO> 

M%S6_CMD_H<4iO> 

M%S6_PAJH<31«3> 

T%MBOX w DR_PP_H 

T_JTG%BSR_EXTEST_L 

T_JTG%B SR_UPD ATK_L 

T_JTG%CAPTURE_L 

T_JTG%DRCLK_H 

TJTCKfcDRCLK^L 



E%TTMEOtrr_ENABLE_H 

T_BSR%MACHINE_CHKCK_H 

K_PAI>%KXT_TMBS_H 

K%RESKT_L 

KC%RESET_L 

K_CE%EESET_H 

K%PHI_1_H 

K%PBCT_2_H 

K%PHI_3_H 

K%PHI_4_H 

K%PHI_1_H 

K%PHI_3_H 

K%PEH_4_H 

K%PHI_1_H 

K%PHI_2_H 

K%PH*_3_H 

K%PH34_H 

P%DISABLE_OUT _L 

K%KXT_RKSET_L 

K%KXT_RESET_L 

M%ABORT_CBOX_rRD_H 

M%CBOX_BYPASS_KNABLK_H 

M%CBOX_LATK_KN_H 

M%CBOX_RKF_ENABLE_H 

T%ISR_LFSR_H 

T%ISR _LOAD_H 

M%C_S6_PA_H<2jO> 

M%S6_BYTE_MASK_H<7.0> 

M<*>S6_CMD_H<4:0> 

M%S8_PA_H«31 13> 

T%MBOX_DR_PP_H 

T%BSR_KXTEST_H 

T%BSR_UPDATK_H 

T%CAPTURK_H 

T_JTG_TAP%DR_CLKKN_H 

T JTG_TAP%DR CLKKNH 
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Table 13-67 (Cont.): Cross-reference of all names appearing in the CBOX chapter 



Schematic Name Behavioral Model Name 



P%ACK_L 


P%ACK_L 


P%CMD_H<3:0> 


P%CMD_H<3:0> 


P%CPU_GRANT_L 


P%CPU_GRANT_L 


P%CPU_HOLD_L 


P%CPU_HOLD_L 


P%CPU_REQ_L 


P%CPU_REQ_L 


P%CPU_SUPPRESS_L 


P%CPU_SUPPRESS_L 


P%CPU_WB_ONLY_L 


P%CPU_WB_ONLY_L 


P%DISABLE_OUT_L 


P%DISABLE_OUT_L 


P%DR_D ATA_H<63 :0> 


P%DRJDATA_H<63:0> 


P%DR_ECC_H<7K)> 


P%DR_ECC_H<7K)> 


P%DR_INDEX_H<20:3 > 


P%DR_INDEX_H<20:3> 


P%DR_OE_L 


P%DR_OE_L 


P%DR WE_L 


P%DR_WE_L 


p%H)_H<2K)> 


P%ED_H<2K)> 


P%NDAL_H<63 K)> 


P%NDALJB<63 :0> 


P%OSC_TCl_H 


P%OSC_TCl_H 


P%PARITY_H<2K)> 


P%PARITY_H<2K)> 


P%PHI12 IN H 


P%PEQU.2_IN_H 


P%PHI23 IN H 


P%PHI23_IN_H 


P%PHI34 IN H 


P%PBI34_IN_H 


P%PHI41_IN_H 


P%PHI41_IN_H 


P%PHI12_OUT_H 


P%PHI12_OUT_H 


P%TS_ECCH<5H>> 


P%TS_JECC_H<5:0> 


P%TS_INDEX_H<20:5> 


P%TS_INDEX_H<20:5> 


P%TS_OEJL 


P%TS_OE_L 


P%TS_OWNED_H 


P%TS_OWNED_H 


P%TS_TAG_H<31:17> 


P%TS_TAG_H<31:17> 


P%TS_VALID_H 


P%TS_VALID_H 


P%TS_WE_L 


P%TS_WE_L 
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13.16 Revision History 



Table 1 3-68: Revision History 



Who 



When 



Description of change 



Rebecca Stamm 
Rebecca Stamm 



9-Oct-1991 
16-Aug-1991 



Rebecca Stamm 



20-Feb-1991 



Rebecca Stamm 
Rebecca Stamm 



14-Aug-1990 
4-Jul-1990 



Rebecca Stamm 
Rebecca Stamm 



3-Jun-1990 
17-May-1990 



Made the following change: Bcache data MUST be initialized with 
correct ECC on powerup, contrary to what was in previous revisions. 

Minor updates and clarifications. RDE and UNEXPECTED_FILL are 
both set if an unexpected RDE arrives on the NDAL. During ETM, a 
read modify that does not hit owned causes a read to memory, NOT 
an OREAD to memory. On uncorr error on RMW, checkbits 3 and 7 
are inverted rather than 3,6,7. Added description of why the bits are 
inverted. 

Correct TS_CMD and DR_CMD encodings. Clarify some sections. 
Add description of NVAX-NDAL timing. Add statements that the 
contents of the Cbox error registers are not changed during reset. 
Added cache timing information. Added table of cache behavior while 
it is ON. Appended P% to the beginning of all pin names, since those 
match the schematics and the beh model. Add assertion levels to 
signal names. 

Remove e^msmort.reskt, add k%reset_cctl_l. 

Correct description of k%mkmory_reset. Added k_ce%rksetjs. Added 
CCTL<FORCE_NDAL_PERR>. Update description of Cbox behavior 
when P%CPU_WB_ONLY_L is asserted. Update conditions for 
servicing the write queue. Update cache coherency section with 
bug correction. Added to cache ram speed table, 16ns. Clarify 
CEFSTS<COUNT>. Clarify BCFLUSH during FORCE_HIT mode. 
Update handling of DREAD lock which fails on an uncorrectable error 
on the first quadword. Clarify handling of correctable error in the tag 
store. Added section about the FILL_CAM and block conflicts. 

Clarify handling of write, readlock in etm. Make 
CEFSTS<UNEXPECTED_FILL> WIC. 

Clarify invalidate handling 

sections. Always give the WRITEBACK_QUEUE priority over the 
NON_WRITEBACK_QUEUE. Change bit definitions in CEFSTS. 
Change WR_MRG_DONE to REQ_FILL_DONE in CEFSTS and 
FILL_CAM. Clarify stalling of IPR accesses to the tag store while 
a FILL_CAM entry to the same block is valid. 
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Table 13-68 (Cont.): Revision History 



Who 



When 



Description of change 



Rebecca Stamm 



20-Feb-1990 



Rebecca Stamm 

Rebecca Stamm 
Rebecca Stamm 

Rebecca Stamm 
Rebecca Stamm 



3-Feb-1990 

19-Jan-1990 
13-Jan-1990 

21-Mar-1989 
16-Mar-1989 



Update error table. Add complete description of timeout counters. 
Change CCTL<TIMEOUT_EXT> 
to CCTL<TIMEOUT_TEST>, update description of that bit. Add 
e%ttmeout_base_h to Cbox interface signal list. Add control signal 
names for scan chain, updated scan chain section, removed two bits 
from the scan chain. Add control signal names for parallel port, 
updated parallel port section. Update description of CEFSTS RDLK 
bit. Clarified description of CEFADR. Clarified tag store actions 
on deallocates. Update performance monitoring hardware section 
and added control bits to CCTL. Correct clock names. Bcache read 
quadwords returned in wrapped order rather than in Grey code order. 
WRITEBACK_QUEUE full prevents all transactions from starting. 
Add BC_TS_CMD decodings for the parallel port. Added TS_CMD 
encodings to BCETSTS. Added DR_CMD encodings to BCEDSTS. 
More detail on NESTS bit descriptions. Better explanation of use 
of BCDECC register. Add detail to WRITE.UNLOCK explanation. 

External release. Eliminated BCEDHI and BCEDLO IPRs. Made 
updates based on internal review. 

Release for internal review. 

Intermediate release. Many edits. Eliminated backup cache data 
RAM access through IPR reads and writes. Updated Cbox internal 
bussing diagrams and description. Write queue is 8 entries. 

Release for external review 

Release for internal review 
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Chapter 14 
Vector Interface 



14.1 Description 

The NVAX CPU chip does not fully support the VAX vector instruction set and any attempt to 
execute a vector instruction will result in a reserved instruction fault. Vector instructions are 
listed in Table 14—1. 

Tabie 14-1 : Vector Instruction Set 



Opcode Instruction 

31FD MFVP regnum.rw, dst.wl 

34FD VLDL cntrl.rw, base.ab, stride-rl 

35FD VGATHL cntrl.rw, base.ab 

36FD VLDQ cntrLrw, base.ab, stride.rl 

37FD VGATHQ cntrLrw, base.ab 

80FD WADDL cntrLrw 

81FD VSADDL cntrLrw, scaLrl 

82FD WADDG cntrLrw 

83FD VSADDG cntrLrw, scal.rq 

84FD WADDF cntrLrw 

85FD VSADDF cntrLrw, scaLrl 

86FD WADDD cntrLrw 

87FD VSADDD cntrLrw, sc&Lrq 

88FD WSUBL cntrLrw 

89FD VSSUBL cntrLrw, scal.rl 

8AFD WSUBG cntrLrw 

8BFD VSSUBG cntrLrw, scaLrq 

8CFD WSUBF cntrLrw 

8DFD VSSUBF cntrLrw, scal.rl 
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Table 14-1 (Cont.): Vector Instruction Set 

Opcode Instruction 

8EFD WSUBD cn.trl.rw 

8FFD VSSUBD cntrl.rw, scaLrq 

9CFD VSTL cntrLrw, base.ab, stridcrl 

9DFD VSCATL cntrLrw, base.ab 

9EFD VSTQ cntrLrw, base.ab, stride.rl 

9FFD VSCATQ cntrLrw, base.ab 

AOFD WMULL cntrLrw 

A1FD VSMULL cntrLrw, scal.rl 

A2FD WMULG cntrLrw 

A3FD VSMULG cntrLrw, scaLrq 

A4FD WMULF cntrLrw 

A5FD VSMULF cntrLrw, scal.rl 

A6FD WMUID cntrl.rw 

A7FD VSMULD cntrLrw, scal.rq 

A8FD VSYNC regnum.rw 

A9FD MTVP regnum.rw, srcrl 

AAFD WDIVG cntrLrw 

ABFD VSDIVG cntrLrw, scaLrq 

ACFD WDIVF cntrLrw 

ADFD VSDIVF cntrLrw, scal.rl 

AEFD WDIVD cntrLrw 

AFFD VSDIVD cntrLrw, scal.rq 

COFD WCMPL cntrLrw 

C1FD VSCMPL cntrLrw, scaLrl 

C2FD WCMPG cntrLrw 

C3FD VSCMPG cntrLrw, scal.rq 

C4FD WCMPF cntrLrw 

C5FD VSCMPF cntrLrw, scaLrl 

C6FD WCMPD cntrLrw 

C7FD VSCMPD cntrLrw, scaLrq 

C8FD WBISL cntrLrw 

C9FD VSBISL cntrLrw, Bcal.rl 

CCFD WBICL cntrLrw 
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Table 14-1 (Cont.): Vector Instruction Set 




CDPD 



VSBICL cntrLrw, scalrl 



EOFD 



WSRLL cntrLrw 



ECFD 



EFFD 



EEFD 



E9FD 



E8FD 



E1FD 



E5FD 



E4FD 



EDFD 



VSSRLL cntrLrw, scal.rl 
WSLLL cntrLrw 
VSSLLL cntrLrw, scaLrl 
WXORL cntrLrw 
VSXORL cntrLrw, scaLrl 
WCVT cntrLrw 
IOTA cntrLrw. scaLrl 
WMERGE cntrLrw 
VSMERGE cntrLrw, scal.rq 



Although the vector instruction set is not fully implemented, some residual support is included 
in the NVAX CPU chip and should be considered: 

• The Ibox, under control of the IROM, decodes the vector instructions listed above, including 
parsing and processing the instruction specifiers. If a memory management exception is 
detected on the instruction or one of the specifiers, the Ibox will report it to the Ebox, which 
will ignore it in favor of reporting a reserved instruction fault instead. However, if a hardware 
error is detected during the processing of the vector instruction or specifiers, that error will 
be reported in the usual way. 

• The ECR<VECTOR_PRESENT> bit remains in the hardware, but a reserved instruction fault 
will result if a vector instruction is executed, independent of the state of this bit. 

• A vector disabled fault will never be generated by the NVkX CPU chip microcode. 

• References to vector processor registers in the range 90-97 (hex) are intercepted by the mi- 
crocode and are not transmitted on the NDAL as is the normal case for an unimplemented 
processor register. Rather, writes to these registers are ignored, and reads from these registers 
return 0. The operating system depends on this behavior. 
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14.2 Revision History 



Table 14-2: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06- Jan- 1990 


Initial release 


Mike Uhler 


02-Feb-1991 


Update after pass 1 PG. 
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Chapter 15 
Error Handling 



This chapter describes the NVAX CPU error exceptions and interrupts as seen from the macrocoder's 
point of view. It is organized with respect to the SCB vectors through which the event is dis- 
patched. The SCB layout and SCB vector format are described in Chapter 2. Exceptions and 
interrupts that are a result of normal system operation are described in Chapter 2. 

15.1 Terminology 



Term Meaning 

Fill Any quadword of data returned to the NVAX CPU chip in response to read-type 

operation. The quadword containing the requested data is a nil. 

Ownership bit In the Bcache and the memory, a bit is stored with each hexaword called the owner- 

ship bit. In the Bcache it indicates the Bcache owns the block; it has the one valid 
copy of the data. In memory it indicates some cache or bus interface has the one 
good copy of the block, not the memory. 

Memory cache state In memory in various system environments, a certain amount of state is kept for 
each hexaword in memory. This state always includes the ownership bit. In some 
system environments, it includes additional information. 

ETM Error transition mode in the Bcache: in this mode the Bcache is not used except if 

it owns the addressed block. It continues to respond to NDAL coherency requests 
which require writeback 



15.2 Error handling Introduction and Summary 

This chapter discusses all levels of hardware and microcode-detected errors. Errors notification 
occurs through one of the following events, listed in order of decreasing severity. 

• Console error halt — A halt to console mode is caused by one of several errors such as Interrupt 
Stack Not Valid. For certain halt conditions, the console prompts for a command and waits 
for operator input. For other halt conditions, the console may attempt a system restart or a 
system bootstrap as defined by DEC Standard 032. The actual algorithms used are outside 
of the scope of this document. 

• Machine check — A hardware error occurred synchronously with respect to the execution of 
instructions. Instruction-level recovery and retry may be possible. 

• Power fail — The power supply asserted the power fail signal. 
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• Hard error interrupt— A hardware error occurred asynchronously with respect to the execu- 
tion of instructions. Usually, data is lost or state is corrupted, and instruction-level recovery 
may not be possible. 

• Soft error interrupt — A hardware error occurred asynchronously with respect to the execution 
of instructions. The error is not fatal to the execution of instructions, and instruction-level 
recovery is usually possible. 

• Kernel stack not valid — During exception processing, a memory management exception oc- 
curred while trying to push information on the kernel stack. 

This chapter explains in detail several of the SCB entry points. The purpose is to help the 
operating system programmer determine exactly what error occurred and to recommend an error 
recovery method. Since this chapter is only concerned with errors which are generic to all system 
environments, it may be used as the basis for a specification of error handling and recovery for 
particular systems based on the NVAX CPU chip. 

The following information is given in this chapter for each SCB entry point: 

• What parameters are pushed on the stack. 

• What failure codes are denned. 

• What additional information exists and should be collected for analysis. 

• How to determine what error(s) actually occurred. 

• How to restore the state of the machine, and what level of recovery is possible. 

Table 15—1 shows the general error categories associated with each of these error notifications. 

Table 15-1: Error Summary By Notification Entry Point 



Entry Point 



SCB Index 
(hex) 



General Error Categories 



Console Halt 



N/A 



Interrupt Stack not valid, kernel-mode halt, 
double error, illegal SCB vector, 
initial Power up, HALTJL assertion 



Machine Check 



04 



Memory management, interrupt, microcode detected CPU errors, 
CPU stall timeout, 

TB parity errors, VIC tag or data parity errors, 
B cache uncorrectable data read errors, 

memory/NDAL read errors (no-ACK, timeout, or RDE from system 
environment) 



Power Fail 



Soft Error 
Interrupt 



OC 



54 



system environment notification via PWRFLJL 

VIC tag or data parity errors, 

Pcache tag or data parity errors, 

B cache uncorrectable tag errors, 

B cache uncorrectable data read errors 

Bcache uncorrectable data errors in writebacks, 

B cache correctable tag and data errors, 

memory/NDAL read errors (no-ACK, timeout, or RDE on reads), 
NDAL parity errors, 

system environment notification via S JERR_L 
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Table 15-1 (Cont.): Error Summary By Notification Entry Point 

SCB Index 

Entry Point (hex) General Error Categories 

Hard Error 60 B cache uncorrectable data errors on write operations, 

Interrupt NDAL no-ACK on writes, 

B cache fill errors in NDAL ownership reads after merging write data 

in the cache data RAMs, 

system environment notification via H_ERR_L 



15.3 Error Handling and Recovery 

All errors (except those resulting in console halt) go through SCB vector entry points and are 
handled by service routines provided by the operating system. A console halt transfers control to 
a hardware-prescribed IO-space address. Software driven recovery or retry is not recommended 
for errors resulting in console halt. 

Software error handling (by operating system routines) can be logically divided into the following 
steps: 

• State collection. 

• Analysis. 

• Recovery. 

• Retry. 

These steps are discussed in general in the next four sections. After that, details are supplied on 
analysis, recovery and retry for each error event which results in an exception or interrupt. This 
information is organized by SCB entry point. 

15.3.1 Error State Collection 

Before error analysis can begin, all relevant state must be collected. The stack frame provides 
the PC/PSL pair for all exceptions and interrupts. For machine checks, the stack frame also 
provides details about the error. 

In addition to the stack frame, machine checks and hard and soft error interrupts usually require 
analysis of other registers. It is strongly recommended that all the state listed below be read 
and saved in these cases. State is saved prior to analysis so that analysis is not complicated by 
changes in state in the registers as the analysis progresses, and so that errors incurred during 
analysis and recovery can be processed with that context. 

Ibox 

ICSR: Ibox (VIC) control and status register. 
VMAR: VIC memory address register. 

Ebox 

ECR: Ebox control and status register. 
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Mbox 

TBSTS: TB status register. 
TBADR: TB address register. 
PCSTS: Pcache status register. 
PCADR: Pcache address register. 

Cbox 

CCTL: Cbox Control Register. 
BCEDSTS: Bcache data error status register. 
BCEDIDX: Bcache data error index register. 
BCEDECC: Bcache data error ECC/syndrome register. 
BCETSTS: Bcache tag error status register. 
BCEHDX: Bcache tag error index/address register. 
BCETAG: Bcache errored tag register. 
CEFSTS: Read and Bcache fill status register. 
CEFADR: Read and Bcache fill address register. 
NESTS: NDAL error status register. 
NEOADR: NDAL error output address register. 
NEOCMD: NDAL error output command register. 
NEICMD: NDAL error input command register. 
NEDATHI: NDAL error input data register (HI). 
NEDATLO: NDAL error input data register (LO). 

System environment 

All states (i.e., CSRs) which report error conditions or events. 

For the purposes of the rest of this chapter, it is assumed that each of these states is saved in a 
variable whose name is constructed by prepending "S_" to the register name. For example, the 
ICSR would be saved in the variable SJCSR. 

The following example shows allocation of memory storage for the error state. 

; ERROR STATE COLLECTION DATA STORAGE 



S_ICSR: 
S VMAR: 



.LONG 
.LONG 



0 
0 



IBOX 

IBOX VIC CONTROL AND STATUS REGISTER 
IBOX VIC ERROR ADDRESS REGISTER 



S ECR: 



0 



EBOX 

EBOX CONTROL AND STATUS REGISTER 



S_TBSTS : 
S~TBADR: 
S_PCSTS : 
S PCADR: 



.LONG 
.LONG 
.LONG 
.LONG 



0 
0 
0 
0 



MBOX 

TB STATUS REGISTER 
TB ERROR ADDRESS REGISTER 
PCACHE STATUS REGISTER 
PCACHE ERROR ADDRESS REGISTER 
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S_CCTL: 

S~ BCEDSTS : 

S_BCEDIDX: 

S_BCEDECC: 

S~ BCETSTS : 

S~BCETIDX: 

S_BCETAG: 

S~CEFSTS : 

S~CEFADR: 

S_NESTS: 

S~NE0ADR: 

S_NEOCMD : 

S_NEICMD: 

S_NEDATHI: 

S NEDATLO : 



.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 
.LONG 



CBOX 

CBOX CONTROL REGISTER 

B CACHE DATA RAM ERROR STATUS REGISTER 

B CACHE DATA RAM ERROR INDEX REGISTER 

B CACHE DATA RAM ECC/ SYNDROME REGISTER 

B CACHE TAG RAM ERROR STATUS REGISTER 

B CACHE TAG RAM ERROR INDEX REGISTER 

B CACHE TAG RAM ERRORED TAG REGISTER 

READ AND B CACHE FILL ERROR STATUS REGISTER 

READ AND B CACHE FILL ERROR ADDRESS REGISTER 

NDAL ERROR STATUS REGISTER 

NDAL OUTPUT ERROR ADDRESS REGISTER 

NDAL OUTPUT ERROR COMMAND REGISTER 

NDAL INPUT ERROR COMMAND REGISTER 

NDAL INPUT ERROR ADDRESS REGISTER (HI) 

NDAL INPUT ERROR ADDRESS REGISTER (LO) 



SYSTEM ENVIRONMENT: 

ERROR REGISTERS FROM THE SYSTEM ENVIRONMENT (MODULE, MEMORY (S) 
ARE SAVED HERE 



BUS INTERFACE (S) ) 



The following example shows collection of error state which would normally be done early in 
the error handling routine. Note the handling of error registers which may be overwritten in 
the event of a more severe error. For example, after a correctable Bcache data RAM error, 
BCEDIDX would hold the index of the correctable error. If an uncorrectable Bcache data RAM 
error occurs, BCEDIDX would be reloaded with the index of the more sever uncorrectable error. 
To ensure the data in BCEDIDX and BCEDECC matches the report in BCEDSTS, a conditional 
test is performed and these two registers are recaptured if both an uncorrectable and correctable 
error are reported in BCEDSTS. Otherwise, BCEDIDX and BCEDECC could reflect a previous 
correctable error even though BCEDSTS reports a more severe error. 



SAVE STATE : 



10S: 



MFPR 
MTPR 



MFPR 



MFPR 
MFPR 
MFPR 
MFPR 



MFPR 

MFPR 

MFPR 

MFPR 

BICL3 

CMPL 

BNEQ ' 

MFPR 

MFPR 

MFPR 

MFPR 

MFPR 

BICL3 

CMPL 

BNEQ 

MFPR 

MFPR 



;IBOX 



#PR1S5_ICSR, S_ICSR 
# PR1 9 $~VMAR, S~VMAR 



#PR19S ECR,S SCR 



#PR19$_TBSTS, S_TBSTS 
#PR1 9 5~TBADR, S~TBADR 
#PR1B$~PCSTS, S__PCSTS 
#PR19$~PCADR, S~PCADR 



;EBOX 



;MB0X 



;CB0X 



#PR19S CCTL,S CCTL 
#PR19$~BCEDIDX f S_BCEDIDX 
#PR19$ BCEDECC , S_BCEDECC 
#PR19$~BC£DSTS, SJ3CEDSTS 

# A C<BCEDSTS5M__CORR ! BCEDSTSSM_LOCK> , S_BCEDSTS , R0 
R0, #BC£DSTS$M~CORR ! BCEDSTS$M~LOCK 
10$ ~ ~ 

#PR19$_BCEDIDX, S_BCEDIDX 
#PR1 9 $_BCEDECC , SJBCEDECC 

#PR19$_BCETIDX, SJSCETIDX 
#PR1 9 S_BCETAG, S_IcETAG 
#PR19$_BCETSTS,S BCETSTS 

# A C<BCETSTS$M_CORR ! BCETSTS$M_LOCK>, SJBCETSTS, R0 

R0,#BCETSTS$M CORR ! BCETSTS $M~LOCK ~ 

20$ ~ 

#PR19$_BCETIDX,S_BCETIDX 

#PR19$_BCETAG, S_BCETAG 
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20$: MFPR #PR19S_CEFSTS r S_CEFSTS 

MFPR #PR1 9 $_C£FADR, S_CEFADR 
MFPR #PR1 9 $~NESTS , £_NESTE 
MFPR #PR1 9 S~NEOADR, S_NEOADR 
MFPR #PR1 9 S~NEOCMD , S_NEOCMD 
MFPR #PR19$_NEICMD,S_NEICMD 
MFPR *PR1 9 $~NEDATHI , S_NEDATHI 
MFPR #PR1 9 5~HEDATL0 , S_NEDATLO 

; SYSTEM ENVIRONMENT 
; COLLECTION OF SYSTEM ENVIRONMENT ERROR REGISTERS GOES HERE 

Additional state collection is recommended while/after flushing the Bcache because certain errors 
may occur as a result of the flush operation. The following state should be collected immediately 
after flushing each Bcache location. 

Cbox 

CCTL: Cbox Control Register. 
BCEDSTS: Bcache data error status register. 
BCEDIDX: Bcache data error index register. 
BCEDECC: Bcache data error ECC/syndrome register. 
BCETSTS: Bcache tag error status register. 
BCETTDX: Bcache tag error index/address register. 
BCETAG: Bcache errored tag register. 
NESTS: NDAL error status register. 
NEOADR: NDAL error output address register. 
NEOCMD: NDAL error output command register. 

System environment 

All states (i.e., CSRs) which report the event of NVAX sending a BADWDATA cycle on the 
NDAL. 

For the purposes of the rest of this chapter, it is assumed that each of these states is saved in a 
variable whose name is constructed by prepending "SS_" to the register name. For example, the 
BCEDSTS register would be saved in the variable SS_BCEDSTS. 

The following example shows allocation of memory storage for additional error state collected 
while/after flushing the Bcache. 

; ADDITIONAL ERROR STATE COLLECTION DATA STORAGE FOR AFTER BCACHE FLUSH 



SS_CCTL: .LONG 0 

SS~BCEDSTS: .LONG 0 

SS~BCEDIDX: .LONG 0 

SS~BCEDECC: .LONG 0 

SS~BCETSTS: .LONG 0 

SS~BCETIDX: .LONG 0 

SS~BCETAG: .LONG 0 

SS~NESTS: .LONG 0 

SS~NEOADR: .LONG 0 

SS~NEOCMD: .LONG 0 



CBOX 

CBOX CONTROL REGISTER 

BCACHE DATA RAM ERROR STATUS REGISTER 

BCACHE DATA RAM ERROR INDEX REGISTER 

BCACHE DATA RAM ECC / SYNDROME REGISTER 

BCACHE TAG RAM ERROR STATUS REGISTER 

BCACHE TAG RAM ERROR INDEX REGISTER 

BCACHE TAG RAM ERRORED TAG REGISTER 

NDAL ERROR STATUS REGISTER 

NDAL OUTPUT ERROR ADDRESS REGISTER 

NDAL OUTPUT ERROR COMMAND REGISTER 



SYSTEM ENVIRONMENT: 

ADDITIONAL ERROR STATE COLLECTION DATA STORAGE FOR AFTER BCACHE FLUSH 

REGISTERS WHICH ARE AFFECTED BY A BADWDATA CYCLE FROM NVAX ARE SAVED HERE 
AFTER THE BCACHE FLUSH 
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The following example shows collection of error state which would normally be collected during 
and just after flushing the Bcache. 

AFTEF._BCE1.USH : 

;CBOX 



MFPR 


#PR19S CCTL,SE CCTL 








MFPR 


#PR19£ BCEDIDX,SS BCEDIDX 








MFPR 


#PR1 9 S~BCEDECC , S£~BC£DECC 








MFPR 


#PR1 9 $~BCEDSTS , S£~BCEDST£ 








BICL3 


#"C<BCED£TSSM_CORR ! BCEDST£SM_LOCK>, 


££ BCEDETS , 


,R0 


CMPL 


R0,#BCEDSTS$K~CORR ! BCSDSTSSM~LOCK 






BNEQ 


30$ 






MFPR 


#PR1 95_BCEDIDX r ££ BCEDIDX 








MFPR 


#PR1 9 £~BCED£CC , SS~BC£DECC 








MFPR 


#PR1 9 S_BCETIDX, E£ BCETIDX 








MFPR 


#?R19S_BCETAG, S£_BCETAG 








MFPR 


#PR19$~" BCETSTS, EE BCETSTS 








BICL3 


#"C<BCET£TSSM_CORR ! BCETSTSSMJLOCK>, 


££ BCET£T£ 


,R0 


CMPL 


R0, #BCETET£SK~CORR ! BCETSTS5K~LOCK 






BNE2 


AOS 








MFPR 


#P?.19S BCETZDX ££ BCEC2DX 








MFPR 


#?R19S~BCE!rAS, SsJbCSTAG 








MFPR 


#?R.19$ NS£T£ r ££ NESTS 








MFPR 


#?R1SS~KS0aD?., ££_KEOADR 








MFP?. 


*?R19S NEC-CSO £E"~KEOCMD 









;£V£TEK EITVIRONMEIIT 

1 5.3.2 Error Analysis 

With the error state obtained during the collection process, the error condition can be analyzed. 
The purpose is to determine what error event caused the particular notification being handled (to 
the extent possible), and what other errors may also have occurred. Analysis of machine checks 
and hard and soft error interrupts should be guided by the parse trees given in the appropriate 
sections below. 

NOTE 

Errors detected in or by one of the caches usually result in the cache automatically 
being disabled. However, to minimize the possibility of nested errors, it is suggested 
that error analysis and recovery for memory or cache-related errors be performed with 
the Pcache disabled and the Bcache in ETM. 

In some cases, a notification for a single error occurs in two ways. For example, an uncorrectable 
error in the Bcache data RAMs will cause a soft error interrupt and may also cause a machine 
check. Software should handle cases where a machine check handler clears error bits and then 
the soft error handler is entered with no error bits set. 

In certain cases one error event results in two related reports. For example, a Bcache 
uncorrectable data error during a writeback will be reported in NESTS as a BAD WD ATA event. 
In this case, the BADWDATA event captures the full address of the errored data (that is why 
BADWDATA is an error event). Cases like this are handled as single error events. 

In general an error reporting register can report events which lead to machine check, soft error, 
or hard error. A given error event can result in machine check and soft error interrupt, or in 
just one or the other. Events which lead to hard error interrupts generally can not also cause 
machine check or soft error interrupt. Sometimes an error event which leads to machine check or 
soft error interrupt is closely related to an event which leads to hard error interrupt (e.g., Bcache 
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fill error on first quadword of a fill for an OREAD done for a write causes soft error interrupt, 
but the same error on a later quadword causes hard error interrupt). 

Multiple simultaneous errors may make useful recovery impossible. However, in cases where 
no conflict exists in the reporting of the multiple errors (i.e., no one error register is used to 
report two errors), and recovery from each error is possible, then recovery from the set of errors 
is accomplished by recovering from all of them. For example, recovery from a Pcache tag parity 
error and a Bcache correctable data error being reported together is possible by following the 
recovery procedures for each error in sequence. 

The error cause determination parse tree for machine check exception is directed at causes or 
possible causes of machine checks. It ignores errors which lead to hard or soft error interrupts 
but not to machine checks. Similarly, the hard error interrupt cause determination ignores 
errors which lead to machine check or soft error interrupt, and the soft error interrupt cause 
determination ignores errors which lead to machine check or hard error interrupt. 

There is a natural order between machine check, hard error interrupt, and soft error interrupt 
because the IPL for hard error interrupts is higher than that of soft error interrupts and the IPL 
in the machine check exception is higher than either of the error interrupts. This hierarchy is 
important because knowledge of which notification event occurred is used to discriminate between 
certain error events (e.g., an error on the initial fill quadword for a read-lock is distinguished from 
a fill error on a subsequent quadword by the fact of machine check notification). 

15.3.3 Error Recovery 

Recovery from errors consists of clearing any latched error state, repairing damaged state 
(if necessary and possible), and restoring the system to normal operation. There are special 
considerations involved in analysis and recovery from cache or memory errors, which are covered 
in the next sections. 

Recovery from multiple error scenarios is possible when there is no conflict in the error registers 
which report the errors and there is no conflict in the recovery procedures for the errors. However 
all recovery procedures in this chapter assume that only one error is present. None of the 
procedures are valid in multiple error scenarios without further analysis. 

In some instances, it may be desirable to stop using the hardware which is the source of a large 
number of errors. For example, if a cache reports a large number of errors, it may be better to 
disable it. It is suggested that software maintain error counts which should be compared against 
error thresholds on every error report. If the count (per unit time) exceeds the threshold, the 
hardware should be disabled. 

NOTE 

Hard failures of one bit in the tag store can lead to unrecoverable errors requiring a 
full system crash. It would be appropriate to have an extremely low threshold for tag 
store correctable errors, especially if they recur in the same location or bit position. 

NOTE 

NVAX CPU utilization of the NDAL and memory is extremely high if the Bcache is 
disabled. In multiprocessor systems a CPU should probably be removed from the 
system rather than being used with the Bcache oft. In a single processor system there 
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may be effects to 10 subsystem performance and latency due to the high NDAL and 
memory utilization. 

15.3.3.1 Special Considerations for Cache and Memory Errors 

Cache and memory error recovery requires special considerations: 

• Cache and memory error recovery should always be done with the Pcache and VIC off and 
the Bcache in error transition mode (ETM). (In certain cases, the last part of recovery must 
be done with the Bcache off.) See Section 15.3.3.1.1.1, Cache Enable, Disable, and Flush 
Procedures. 

• Bcache flush is necessary before re-enabling the Bcache whenever it is in ETM. See 
Section 15.3.3.1.1, Cache Coherence in Error Handling. 

• Bcache flush should be always be done one block at a time, recapturing the relevant error 
registers between each block flush. 

• Cache coherence requires a specific procedure for re-enabling the caches. See 
Section 15.3.3.1.1, Cache Coherence in Error Handling. 

• Error recovery should be performed starting with the most distant component and working 
toward the CPU and Ebox. System environment memory errors should be processed first, 
followed by NDAL errors, Bcache fill errors, Bcache tag store and data RAM errors, Pcache 
errors, TB errors, and, finally, VIC errors. 

• NDAL errors are cleared by writing the write-one-to-clear bits in NESTS. The suggested way 
to do this is to write a one to the specific error bit. 

• Bcache fill errors are cleared by writing the write-one-to-clear bits in CEFSTS. The suggested 
way to do this is to write a one to the specific error bit. Special recovery procedures may be 
necessary after Bcache fill errors. See Section 15.3.3.1.2, Special Writeback Cache Recovery 
Situations and Procedures. 

• Bcache tag store errors are cleared by writing the write-one-to-clear bits in BCETSTS. The 
suggested way to do this is to write a one to the specific error bit. Special recovery procedures 
may be necessary after Bcache uncorrectable tag store errors. See Section 15.3.3.1.2, Special 
Writeback Cache Recovery Situations and Procedures. 

• Bcache data RAM errors are cleared by writing the write-one-to-clear bits in BCEDSTS. The 
suggested way to do this is to write a one to the specific error bit. Special recovery procedures 
may be necessary after Bcache uncorrectable data RAM errors. See Section 15.3.3.1.2, Special 
Writeback Cache Recovery Situations and Procedures. 

• Hardware ETM is cleared by writing the write-one-to-clear bit in CCTL. The suggested way 
to do this is to write the value saved during error state collection back to the register. 

• Pcache tag and data store errors are cleared by writing the write-one-to-clear bits in PCSTS. 
The suggested way to do this is to write a one to the specific error bit. Pcache flush is necessary 
after Pcache tag store parity errors. See Section 15.3.3.1.1.1, Cache Enable, Disable, and 
Flush Procedures. 

• TB errors are cleared by writing the write-one-to-clear bits in TBSTS. The suggested way to 
do this is to write a one to the specific error bit. 

• PTE read errors are cleared by writing the PTE error write-one-to-clear bits in PCSTS. The 
suggested way to do this is to write a one to the specific error bit. 



DIGITAL CONFIDENTIAL 



Error Handling 1 5-9 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



• VIC errors are cleared by writing the write-one- to- clear bits in ICSR. The suggested way 
to do this is to write a one to the specific error bit. VIC flush and re-enable is necessary 
after VIC tag store parity errors. See Section 15.3.3.1.1.1, Cache Enable, Disable, and Flush 
Procedures. 

15.3.3.1.1 Cache Coherence in Error Handling 

Certain procedures must be followed in order to maintain cache coherence while enabling NVAX 
caches. Since many errors cause caches to be disabled, and since cache and memory error recovery 
is normally done with the Pcache and VIC off and the Bcache in ETM, the complete cache enable 
procedure is done as part of recovery from all cache and memory errors. 

Once the Bcache is in ETM mode, it will not be coherent with memory if it is re-enabled 
before being flushed. This is because writes (from the Mbox) to blocks which happen to be 
VALID_UNOWNED in the Bcache are not copied into the Bcache data RAMs. These writes are 
only sent out on the NDAL. Once the Bcache is put in ETM by hardware or software action, a 
Bcache flush must be done before re-enabling the Bcache. The procedure is described in the next 
section. 

While the Bcache in in ETM or off, the Pcache will stay coherent with memory However, before 
the Bcache is re-enabled, the Pcache must be disabled. After the Bcache is re-enabled, the Pcache 
must be flushed before it is re-enabled. The procedure is described in the next section. If a Pcache 
tag parity error occurred, the flush procedure given is sufficient to clean up the Pcache tag store. 

The VIC (virtual instruction cache) is not automatically kept coherent with memory It is flushed 
as a side effect of the REI instruction (as required by the VAX architecture). Normally in error 
recovery, there is no definite need to flush the VIC. For consistency and for the sake of beginning 
error retry in a known state, flushing the VIC during error recovery is recommended. However, 
in the event of VIC tag parity errors, the complete VIC flush procedure described in the next 
section must be done. 

The TB is not automatically kept coherent with memory. Software uses the TBIS and TBIA 
functions to maintain coherence, and the LDPCTX instruction clears the process PTEs in the 
TB. Normally in error recovery, there is no definite need to flush the TB. For consistency and 
for the sake of beginning error retry in a known state, flushing the TB during error recovery is 
recommended. When a TB parity error occurs, Mbox hardware flushes the TB by itself (via an 
internally generated TBIA), but it would be appropriate for software to test the TB after a parity 
error. This is discussed in Section 15.3.3.1.3. 

15.3.3.1.1.1 Cache Enable, Disable, and Rush Procedures 

lb enable the NVAX caches, the caches are flushed and enabled in a specific order. The ordering is 
necessary for coherence between the Bcache, Pcache, and memory. For simplicity, one procedure 
is given for enabling the NVAX caches, even though variations on the procedure may also produce 
correct results. Disabling the caches can be done in any order, though one procedure is given 
here. 

In error handling, the VIC and Pcache are disabled while the Bcache is placed in ETM. The 
Bcache flush from ETM procedure is done to turn off the Bcache altogether. The cache enable 
procedure assumes that the Bcache is completely off at the start. 
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15.3.3.1.1.1.1 Disabling the NVAX Caches for Error Handling (Leaving the Bcache in ETM) 

This is the procedure for disabling the NVAX caches (placing the Bcache in ETM): 

NOTE 

These procedures will be supplied with MACRO coding examples. 

• Disable the VIC: 

TBS (MTPR to ICSR) 

• Disable the Pcache: 

TBS (MTPR to PCCTL) 

• Put the Bcache in software ETM: 

TBS (KTPR to CCTU 

15.3.3.1.1.1.2 Flushing and Disabling the Bcache 

This is the procedure for flushing the Bcache and disabling it: 

• Flush and disable the Bcache: 

Errors can occur as a result of flushing the Bcache. Before carrying out the procedure, 
BCEDSTS and BCETSTS should be clear of unrecoverable errors, and NESTS should be clear 
of unrecoverable outgoing errors. The MTPRs to BCFLUSH IPRs should be done one block at 
a time, checking the BCEDSTS and BCETSTS error registers after each one. (The MFPR from 
BCEDSTS or BCETSTS will not finish until all the Bcache accesses which result from the MTPR 
to BCFLUSH are done.) Otherwise any unrecoverable error which occurs during the flush may 
become a lost unrecoverable error and a system crash will most likely be necessary. 

Errors which occur while flushing the Bcache are separate errors and should be handled 
independently of the initial error. However, certain errors may be expected during the flush 
procedure, based on the initial error. Also, the successful outcome of the Bcache flush procedure 
is important in determining whether to relay or restart the interrupted or machine checked 
instruction stream. 

1 5.3.3.1 .1 .1 .3 Enabling the NVAX Caches 

The procedure for enabling the NVAX caches after an error is the same as is used to initialize 
the caches after power-up. See Section 16.4, Cache initialization). This procedure ensures that 
error retry/restart occurs with the caches in a known state. The procedure is outlined below. 

• The caches must all be disabled and the Bcache must be disabled (not just in ETM). 
Follow the above procedures to reach this state. 

• Flush the Bcache (Loop on MTPR to BCTAG IPRs). 

• Enable the Bcache (MTPR to CCTL). 

• Flush the Pcache (Loop on MTPR to PCTAG IPRs). 

• Enable the Pcache (MTPR to PCCTL). 
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• Flush the TB: 

MTPR #0, #PR19$_TBIA 

• Flush the VIC (Loop on MTPRs to VMAR and VTA.G, writing an initial value). 

• Enable the VIC (MTPR to ICSR). 

15.3.3.1.2 Special Writeback Cache Recovery Situations and Procedures 

Writeback caching can lead to a couple of special error cases. Some of them can be recovered. 
Sometimes, further state determination or state capture is required after the error cause 
determination guided by the parse trees in the sections on machine check exceptions and hard 
and soft errors. Further analysis may also be necessary. 

15.3.3.1.2.1 Bcache Uncorrectable Error During Writeback 

When a Bcache uncorrectable data RAM error occurs in a writeback, the status, cache index:, and 
error syndrome are captured in BCEDSTS. BCEDIDX, and BCEDECC. As it is written back, the 
data is tagged-bad via the BADWDATA NDAL command. However, the address of the lost data is 
not captured in the Bcache error registers (for implementation reasons). For this reason, sending 
BADWDATA on the NDAL is treated as if it were an error by the bus interface unit (BIU). This 
means the full address is captured in NEOADR while the status is captured in NESTS. This 
writeback can sit in the writeback queue in the BIU for an indefinite amount of time. If a Bcache 
uncorrectable error on writeback is detected, but NESTS does not show any outgoing error status, 
the writeback queue must be drained to continue the analysis and recovery. This is most easily 
accomplished by the following IPR write. 

MFPR #?R1SS_CWB,R0 

S_NESTS should be reloaded from NESTS after this operation. If S_NESTS does not show the 
the BADWDATA error status after draining the writeback queue, and it shows no other outgoing 
error, then there is a serious inconsistency and the system should be crashed. 

1 5.3.3.1 .2.2 Memory State 

Memory in NVAX systems supports the writeback cache by maintaining some amount of state for 
each hexaword (each cachable block) in memory. In XMI2 systems with XMA2 memory modules, 
an ownership bit, and interlock bit, and an owner ID is stored for each hexaword. In OMEGA 
systems, only an ownership bit is stored for each block. Other system environments are possible. 

The effect of a given error on the stored ownership bit in memory is system specific. Since the 
system environment is not directly aware of errors which occur inside the NVAX CPU chip, the 
system specific behavior is limited to the result of system environment errors. 

It is always assumed that a an ownership read command no-ACKed on the NDAL doesn't affect 
the ownership bit in memory. Depending on the system, the state of memory's ownership bit 
(and other such state) may be UNPREDICTABLE or determinate after errors in data returned 
for ownership reads. If it is determinate, it may be set or reset, possibly depending on which fill 
quadword had the error and on the sort of error that occurred. 

This specification assumes that memory does not reset a set ownership bit on a WDISOWN until 
all four quadwords have been successfully received by memory (as is stated in Chapter 3). 
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15.3.3.1.2.2.1 Accessing Memory State 

In recovering from certain errors it is necessary to read (or access by some means) the 
state memory has stored with each hexaword. This specification assumes a routine called 
MEMORY_STATE exists which returns this state given a block address. 

MEMORY_STATE may have system specific errors and side effects. For example, in XMI2 
systems this routine may cause a read timeout error in the memory module and a corresponding 
machine check. Software must be prepared to handle this. Before calling MEMORY_STATE, 
software should confirm that all registers which may end up reporting expected errors are clear 
of errors. This helps minimize the possibility that an unrelated error event is ignored because it 
appears to be an expected error. In the XMI2 example, within the NVAX CPU, CEFSTS is the 
register to check because a memory read timeout is the only error which is expected as a side 
effect of MEMORYJ3TATE. 



15.3.3.1.2.2.2 Repairing Memory State (Fill Errors) 

In recovering from various Bcache fill errors it is necessary to reset the ownership state in memory. 
In some system environments, this can be done without writing the data in memory. In others 
reseting the ownership state may have the side effect of altering the data stored in the memory 
block. 

In cases where the fill error resulted from "lost" 1 data which can not be recovered, the ownership 
bit may still be set in memory while no cache owns the block. If the data is private to one 
process, then the system may be able to continue operating after killing that one job. The system 
dependent procedure is then used to reset the ownership bit. 

For certain Bcache fill errors, an attempt is made to reset the ownership bit in memory, while 
maintaining or restoriong the correct data to the memory block. 

* All the data is in memory. One or more quadwords of (the same) data are also in the cache. 
Memory's ownership bit is set (meaning it "thinks" a cache owns the block). The owner ID 
stored with the block in memory indicates this CPU. The cache tag for the block does not 
indicate the block is owned. (In general, if no writes to this block timeout, and the block is 
private to one process, then the repair can be done.) 

* All the data is in memory. One or more quadwords of data are also in the cache, and one 
quadword has been altered by the Cbox in processing a write to that block from the Mbox. 
Memory's ownership bit is set (meaning it "thinks" a cache owns the block). The owner ID 
stored with the block in memory indicates this CPU. The cache tag for the block does not 
indicate the block is owned. (In general, if no writes to this block timeout, and the block is 
private to one process, then the repair can be done.) 

NOTE 

If an owner ID for each block is not stored in memory, then recovery of the lost data 
is not recommended. The data should be treated as lost, and the appropriate system 
actions should be taken. 



In this case the more general sense of lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks'' owns the data, but it is often not possible to determine which error caused this situation to arise. 
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lb recover from the first situation listed above in an XMI2 system, for instance, one of the correct 
quadwords in the Bcache is accessed (see Section 15.3.3.1.2.3) and used in the XME2 procedure 
for reseting mekory's ownership bit. The side effect of this procedure is the the data extracted 
from the Bcache is written to memory. Given that the block is private to one process and no 
writes have timed out in memory, this data is still correct. (Note that software must somehow 
ensure that no writes to this block are pending in the memory before beginning the repair. This 
can be done by waiting an amount of time equal to an XMA2 write timeout time.) 

To recover from the second situation listed above in an XMI2 system, the same procedure is 
followed, but the data written back is part of the known-altered quadword. The remainder of the 
known-altered quadword is written to the block after the repair. 

15.3.3.1.2.2.3 Repairing Memory State (Tagged-Bad Locations) 

In recovering from Bcache uncorrectable data RAM errors on writebacks is necessary to reset 
the tagged-bad-data state for a block in memory. This is a system specific procedure. In general, 
before clearing the tagged-bad data state of memory, software must first ensure that no more 
accesses to the block can occur. Otherwise there is the danger that some process on some other 
processor or a DMA 10 device will see incorrect data and not detect an error. 

In XMI2, a sequence of operations involving writes to registers in a memory module followed by 
a write to the memory block in question is required. To do this the Bcache should be off, because 
NVAX will not issue a write to memory when the cache is enabled (or is in ETM and the block's 
tag indicates VALID-OWNED). 

In OMEGA, reseting tagged-bad-data state in memory requires that a full quadword write to the 
tagged-bad quadword be accomplished. The most straightforward way for NVAX software to do 
this is to fill in the Bcache tag store and data RAMs with a VALID-OWNED block and force a 
writeback (via a MTPR to BCFLUSH). 

15.3.3.1.2.3 Extracting Data from the Bcache 

Tb extract data from the Bcache, the Bcache is placed in FORCE_HIT mode. Before this is done, 
the Bcache must be off. 

With the Bcache flushed and disabled, set the Bcache in FORCEJHIT mode and extract the data. 
Note that the code which executes this procedure and its local data must be in 10 space. The 
TB entries (PTEs) which map this code and local data must be fixed in the TB. (This is most 
easily done by flushing the TB via an MTPR to TBIA and then accessing all the relevant pages in 
pages in sequence.) Otherwise Bcache FORCE_HIT will interfere with instruction fetch, operand 
access, and PTE fetches in TB miss sequences. 

The following instruction places the Bcache in FORCEJHIT mode: 

TBS (MTPR to CCTL) 

With the Bcache in FORCE_HIT mode, a read in memory space of any address whose index portion 
matches the index of the cache data will return the data (provided there is no uncorrectable data 
RAM error). This is most easily accomplished by reading from the true address of the data. 



1 5-1 4 Error Handling 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



NOTE 

In FORCE_HIT mode, Bcache data RAM ECC errors are detected (unless 
CCTL<DISABLE_ERRORS> is set). Software should prepare for an ECC error 
(BCEDSTS unrecoverable error bits should be clear). 

The Bcache is restored to the disabled state by: 

TBS (MTPR to CCTL) 

15.3.3.1.2.4 Address Determination Procedure for Recovery from Uncorrectable Bcache Data 
RAM Errors 

After an uncorrectable data RAM error in the Bcache, only the index of the block is stored, not 
the complete physical address. The procedure for constructing the physical address of the error 
is given here. It depends on the assumption that the block has not been replaced. The detailed 
error descriptions only refer to this procedure when this assumption is valid. 

This is the procedure for constructing a physical address from the contents of S_BCEDIDX and the 
tag indicated by that register. It uses the Bcache tag ECC check routine found in Section 15.10. If 
an unrecoverable ECC error if found in the tag, then the address can not be determined directly. 

NOTE 

The above procedure is used in the event of a Bcache data RAM error. If it fails 
because the tag also has an uncorrectable error, then the error should be considered 
unrecoverable. However, the search procedure described in the next section could be 
used to obtain useful information for the error log (specifically, which blocks this CPU 
has marked owned in memory for this cache index). 

15.3.3.1.2.5 Special Address Determination Procedure for Recovery from Uncorrectable Bcache 
Tag Store Errors 

An uncorrectable tag store error in the Bcache can cause certain interesting error cases. In some 
of these cases data may be lost (the copy in the Bcache was overwritten). In other cases, the 
data is still good in the cache. In all cases, the address of the lost data is not directly known. A 
special procedure must be used to determine this address. 

This section describes the generic address determination procedure for use in recovering from 
uncorrectable tag store errors. Specific error event descriptions in Section 15.5.2, Section 15.7.1, 
and Section 15.8.1 refer to this procedure for address determination. The possible outcomes of 
this procedure are: 

* The single address of a lost data block is found. Retry and recovery information for the error 
is found in the specific error event description which referred to this address determination 
procedure. 

• No address is found. It can be assumed that no block was owned by the Bcache (or the error 
was transient). Retry and recovery information for the error is found in the specific error 
event description which referred to this address determination procedure. 
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• Multiple addresses are found. Tins is a multiple unrecoverable error situation, and the system 
should be crashed. 

The procedure for determining the address of a lost data block follows. Note that this procedure 
assumes the relevant tag in the Bcache is not (correct or correctable) VALID- OWNED. This 
procedure is for analyzing the result of errors in that tag. 

This procedure assumes that MEM ORY_STATE will return the ownership state and the physical 
ID of the CPU which memory "thinks" owns the block. If memory does not store an owner ID 
and there is exactly one writeback cache in the system, then the lack of an owner ID might not 
prevent error recovery. 

• The Bcache should be in ETM. 

• Search for the address: 

(Search all memory block addresses whose index portion marches the index of the Bcache tag with the 
errcr. Check memory state for the block. 11 this C?~ is the owner of that block, then the block is lost 
Continue the search even if one lost block is found. Sero, one, cr multiple lost blocks could be preser 
Kcte that in systems with no owner 22 bits in memory and exactly one Cr", it may or may not be possible 
to assume that every owned block is owned by the C?C It may be necessary :: confirm each set owned bit 
reading the marked location. If it is owned by this C3Z, the read should timeout.; 

NOTE 

This procedure is specific to recovering from tag store errors in one CPU. So when 
the memory state for a block indicates another cache in the system owns a particular 
block, that block is not counted as lost. That block may be 'lost" in the more general 
sense (if the cache indicated as the owner no longer "knows" that it owns the block or 
is somehow unable to write it back.) The purpose here is only to find blocks that are 
definitely lost as a result of errors involving this CPU. 



15.3.3.1.3 Cache and TB Test Procedures 

TBS 

OUTLINE OF TO-BE-SPECIFIED TEST PROCEDURES 

Testing is generally done using the force hit mode of a cache. The code and data 
of the test procedure must reside in 10 space. Assuming memory management is 
enabled during this procedure, the needed PTEs must be in the TB before entering 
force hit mode in the Pcache or Bcache. For the Bcache, testing should be done 
with errors disabled. The ECC logic should be tested thoroughly on one location by 
forcing various check bit patterns and examining the syndrome latched on the read 
(BCEDECC is loaded on every read in Bcache disable-errors mode). Pcache and VIC 
parity checking should be tested by writing bad parity into the arrays. TB testing may 
be accomplished by writing to MTBTAG and MTBPTE (with care to not change any 
TB entry necessary for the test code and data and not to cause two TB entries to exist 
for one address). PROBER and PROBEW (setting PSL<PRVJMOD>) are then used to 
verify the protection bits. Testing the modify bit would be difficult, though approaches 
exist. 
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15.3.4 Error Retry 

Error retry is a function of the error notification (machine check or error interrupt), error type, 
and error state. The sections below specify the conditions under which the instruction stream 
may be restarted. 

If retry is to be attempted, the stack must be trimmed of all parameters except the PC/PSL pair. 
This is necessary only for machine checks, because error interrupts do not provide any additional 
parameters on the stack. An REI will then restart the instruction stream and retry the error. 
Some form of software loop control should be provided to limit the possibility of an error loop. 
Note that pending error interrupts may be taken before the retry occurs, depending on the IPL 
of the interrupted or machine checked code. 

Strictly speaking, an REI from a hard or soft error interrupt handler is not a retry since these 
interrupts are recognized between macroinstructions. A machine check exception is an instruction 
abort, and an REI from the handler will cause the failing instruction to be retried (provided retry 
is indicated by analysis). What these cases all have in common is that the interrupted instruction 
stream is restarted. This is only done when the result of error analysis and recovery is such that 
all damaged state has been repaired and there is no reason to suspect that incorrect results will 
be produced if the image is restarted and another error does not occur. 

If complete recovery from one or more errors is not possible (i.e.. some state is lost or it is 
impossible to determine what state is lost), possibly the entire system will have to be crashed, a 
single process will have to be deleted, or some other action will have to be taken. Software must 
determine if the error is fatal to the current process, to the processor, or to the entire system, 
and take the appropriate action. 

It is expected that software handles machine checks, soft error interrupts, and hard error 
interrupts independently. For example, after handling a machine check from which retry is to 
occur, software does not check for errors which might cause a pending hard or soft error interrupt. 
The machine check handler is exited via REI (after trimming the machine check information off 
the stack). If the IPL of the machine checked instruction stream is low enough, any pending hard 
or soft error interrupt is taken before the retry occurs. However, if the interrupted instruction 
stream was running at high IPL, then it will continue oblivious of remaining errors. 

15.3.4.1 General Multiple Error Handling Philosophy 

Multiple errors may be reported at the same time. In some cases the NVAX CPU pipeline will 
contain multiple operand prefetches to the same memory block. This can cause multiple errors 
from a single non-transient failure. It could also occur that two separate errors occur at nearly 
the same time and are thus reported simultaneously. 

Multiple error scenarios may be grouped into the following three classes: 

1. Multiple distinct errors for which no error report interferes with the analysis of any other 
(e.g., no lost error bits set). 

2. Multiple errors which could have been caused by the NVAX CPU pipeline issuing more than 
one reference to a given block before the error interrupt or machine check forced a pipeline 
flush. 

3. Multiple errors for which analysis is complicated because the reports interfere with each 
other. 
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It is the intent of this chapter to recover from class 1 (above) by simply treating the errors as 
separate and recovering from each in turn. Retry or restart evaluation is based on the cumulative 
result of the recovery and repair procedures for each error. 

For class 2, specific cases are identified in which lost errors are tolerated. These cases are 
selected because the NVAX pipeline can easily cause them (given one error), and because sufficient 
safeguards exist to ensure that correct operation is maintained. Section 15.3.4.2 lists these cases. 

Class 3 scenarios are generally not considered recoverable. The system is simply crashed in those 
cases. 

Note that lost correctable errors are not considered serious problems since hardware recovers 
from those automatically. 

15.3.4.2 Retry Special Cases 

The multiple error scenarios which are handled are listed below. They are made likely by the 
NVAX pipeline's tendency to prefetch operands. The safeguard that exists in all cases is that 
errors inconsistent with correct operation after the error (such as lost data) will invariably cause 
a hard error interrupt or be detectable by the analysis accompanying the machine check or soft 
error interrupt. 

• Lost Bcache data RAM uncorrectable ECC errors and addressing errors. 
(BCEDSTS<LOSTJ2RR>) 

• Lost Bcache fill errors (timeouts and RDEs). (CEFSTS<LOST_ERR>) 

• Lost NDAL output errors (No-ACKs). (NESTS<LOST_OERR>) 

NOTE 

Retry from a machine check is done even when a hard error interrupt might be pending. 
If the machine checked I-stream were running at high enough IPL, it would not be 
interrupted immediately. Typical hard error causes are write errors. They can not 
cause a machine check. So the fact that a serious error is ignored in the machine 
check retry equation is not considered a problem. The other error would probably have 
occurred anyway and it would not have interrupted the I-stream until IPL was lowered. 
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15.4 Console Halt and Halt Interrupt 



A console halt is not an exception, but rather a transfer of control by the NVAX CPU microcode 
directly into console macrocode at the boot ROM address E0040000 (hex). Console halts are 
initiated at powerup, by certain microcode-detected double error conditions, and by the assertion 
of the external halt interrupt pin, HALTJL. 

There is no exception stack frame associated with a console halt. Instead, the SAVPC and SAVPSL 
processor registers provide the necessary information. The format of SAVPC (IPR 42) is shown 
in Figure 15-1. 



Figure 15-1 : IPR 2A (hex), SAVPC 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16 | 25 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| Saved PC | : SAVPC 

+ + + — + + — + + + + + — + + — + h — + + — + + — + + + — + + — + + — + + — + + — + + + — + 



The PSL, halt code, MAPEN<0>, and a validity bit are saved in SAVPSL (IPR 43). The format 
of SAVPSL is shown in Figure 15—2. The halt codes are shown in Table 15-2. 

Figure 15-2: IPR 2B (hex), SAVPSL 



31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12|11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
| PSL<31:16> | I | Halt Code | PSL<7:0> | : SAVPSL 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

I I 

MAPEN<0> — + I 
Invalid SAVPSL if 1 --+ 



The possible halt codes that may appear in SAVPSL<13:8> are listed in Table 15-2. 



Table 1 5-2: Console Halt Codes 



Mnemonic 


Code (Hex) 


Meaning 


ERR.HLTPIN 


02 


HALTJL pin asserted 


ERR.PWRUP 


03 


Initial power up 


ERRJNTSTK 


04 


Interrupt stack not valid 


ERRJDOUBLE 


05 


Machine check during exception processing 


ERR_HLTINS 


06 


HALT instruction in kernel mode 


ERRJLLVEC 


07 


Illegal SCB vector (bits <1:0> = 11) 


ERR.WCSVEC 


08 


WCS SCB vector (bits <1:0> = 10) 


ERR.CHMFI 


OA 


CHMx on interrupt stack 


ERRJlEO 


10 


ACV/TNV during machine check processing 


ERR.IEl 


11 


ACV7TNV during kernel-stack- not- valid processing 
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Table 15-2 (Cont.): Console Halt Codes 


Mnemonic 


Code (Hex) 


Meaning 


ERRJE2 


12 


machine check during machine check processing 


ERR_EE3 


13 


machine check during kernel-stack-not-valid 






processing 


ERR_IE_PSL_26_24_101 


19 


PSL<26:24> = 101 during interrupt or exception 


ERR_IE_PSL_26_24_110 


1A 


PSL<26:24> = 110 during interrupt or exception 


ERR_IE_PSL_26_24_111 


IB 


PSL<26:24> = 111 during interrupt or exception 


ERR_REI_PSL_26_24_101 


ID 


PSL<26:24> = 101 during REI 


ERR_REI_PSL_26_24_110 


IE 


PSL<26:24> = 110 during REI 


ERR_REI_PSL_26_24_111 


IF 


PSL<26:24> = 111 during REI 



NOTE 

In certain error conditions detected during the execution of a string instruction, the 
state packup sequence leaves the FPD bit set in the SAVPSL register, but the SAVPC 
register pointing at the instruction following the string instruction, rather than at 
the string instruction itself. If the FPD bit is no set in the SAVPSL register, SAVPC is 
correct. As error halts are not normally restartable, this is not a problem. For a console 
halt due to the assertion of the HALTJL pin, which is the only normally restartable 
console halt, SAVPC is always correct, even if the halt interrupt was detected during 
the execution of a string instruction. 

At the time of the halt, the current stack pointer is saved in the appropriate IPR (0 to 4), 
and SAVPSL<31:16,7:0> are loaded from PSL<31:16,7:0>. SAVPSL<15> is set to MAPEN<0>. 
SAVPSL<14> is set to 0 if the PSL is valid and to 1 if it is not (SAVPSL<14> is undefined after 
a halt due to a system reset). SAVPSL<13:8> is set to the console halt code. 

To complete the hardware restart sequence and thereby pass control to the console macrocode, 
the state shown in Table 15-3 is initialized. 

Table 15-3: CPU State Initialized on Console Halt 



State Initialized Value 

SP IPR 4 (IS) 

PSL 041F0000 (hex) 

PC E0040000 (hex) 

MAPEN 0 

ICCS 0 (after reset, code=3, only) 

SISR 0 (after reset, code=3, only) 

ASTLVL 4 (after reset, code=3, only) 

PAMODE 0 (after reset, code=3, only) 

BPCR<31:16> FECAChex) (after reset, code=3, only) 
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Table 15-3 (Cont.): 


CPU State Initialized on Console Halt 


State 


Initialized Value 


CPUID 


0 (after reset, code=3, only) 


all else 


undefined 
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15.5 Machine Checks 

The machine check exception indicates a serious system error. Under certain conditions, the error 
may be recoverable by restarting the instruction. The recoverability is a function of the machine 
check code, the VAX Restart bit (VR) in the machine check stack frame, the opcode, the state of 
PSL<FPD>, the state of certain second-error bits in internal error registers, and most probably, 
the external error state. 

A machine check results from an internally detected consistency error (e.g., the microcode reaches 
an "impossible" state), or a hardware detected error (e.g., an uncorrectable Bcache ECC error on 
a data read). 

A machine check is technically a macro instruction abort. The NVAX CPU microcode attempts to 
convert the condition to a fault by unwinding the current instruction, but there is no guarantee 
that the instruction can be properly restarted. As much diagnostic information as possible is 
pushed on the stack and provided in other error registers. The rest of the error parsing is then 
left to the operating system. 

When the software machine check handler receives control, it must explicitly acknowledge receipt 
of the machine check via a write of any value to the MCESR processor register with the following 
instruction: 

MTPR #0 , #PR1 9 $_MCESR 

Figure 15-3: IPR 26 (hex), MCESR 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+ — + — + — + — + — + — + — + — + — + — + — + — + — + 
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| : MCESR 
+ — +--+ — + — + — +--+ — + — + — + — +— + — +--+ — + — + — + — +— + — + — + — + — + — +--+ — + — + — + — + — + — +--+ — + 



15.5.1 Machine Check Stack Frame 

The machine check stack frame is shown in Figure 15-4. The fields of the stack frame are 
described in Table 15—4, and the possible machine check codes are listed in Table 15-5. The 
contents of all fields not explicitly defined in Table 15-4 are UNDEFINED. 
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31 30 29 28 | 27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 
+ — + — + — +— + — + — +— + — +— + — + — + — + — + — + — + — + — + — + — +--+ — + — + — +— + — + — + — + — +—+--+—+ — + 
I 24 (byte count of parameters, not including this longword) | : (SP) 

+ — +— + — +—+—+—+—+ — +— + — + — + — + — + — +--+--+ — +— + — +--+ — +— + — +— + — +--+ — + — +—+—+—+—+ 
I ASTLVL I x x x x x| Machine Check Code |xxxxxxxx| CPTJID | 

+ — +--+-- +--+--+-- +--+ — +--+ — +— + — +--+ — +--+--+ — +--+ — +--+ — +--+ — +--+ — + — + — + — + — +—+—+—+ 
I INT. SYS register | 

+ — + — + — + — + — + — +— + — +--+ — +--+ — + — + — + — + — + — +--+ — +--+ — + — + — +--+ — + — + — + — + — + — +— + — + 
I SAVEPC register | 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I VA register | 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+ — + 
I Q register I 

+ + + + + h + + + K + + + + + + + + + + + + + + + + + + + + + + + 

I Rn | x x|Mode | Opcode | x x x x x x x x|VR| x x x x x x x| 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I PC | 

+ — + — + — +--+--+ — +— + — +— + — +— +— + — + — +— +— + — +--+ — + — + — + — + — +--+ — + — + — + — + — + — +--+ — + 
I PSL | 

+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
31 30 29 28127 26 25 24|23 22 21 20|19 18 17 16 1 15 14 13 12 1 11 10 09 08 1 07 06 05 04|03 02 01 00 



Table 15-4: Machine Check Stack Frame Fields 

Longword Bits Contents 

(SP)+0 31:0 Byte count — This longword contains the size of the stack frame in bytes, not 

including the PC, PSL, or the byte count longword. Stack frame PC and PSL 
values should always be referenced using this count as an offset from the stack 
pointer. 



(SP)+4 31:29 ASTLVL— This field contains the current value of the VAX ASTLVL register. 

23:16 Machine check code — This longword contains the reason for the machine check, 

as listed in Table 15-5. 

7:0 CPUID— This field contains the current value of the VAX CPUED register. 



(SP)+8 31:0 INT.SYS register— This longword contains the value of the INT.SYS register 

and read onto the Abus by the microcode. The fields in this register are 
described in Chapter 10. 



(SP)+12 31:0 SAVEPC— This field contains the SAVEPC register which is loaded by microcode 

with the PC value in certain circumstances. It is used in error handling for PTE 
read errors with PSL<PPD> set in this stack frame. 



(SP)+16 31:0 VA register — This longword contains the contents of the Ebox VA register, which 

may be loaded from the output of the ALU. 
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Table 15-^* (Cont.): 


Machine Check Stack Frame Fields 


Longword 


Bits 


Contents 


(SP)+20 


31:0 


Q register — This longword contains the contents of the Ebox Q register, which 
may be loaded from the output of the shifter. 


(SP)+24 


31:28 


Rn — This field contains the value of the Rn register, which is used to obtain the 
register number for the CVTPL and EDIV instructions. In general, the value 
of this field is UNPREDICTABLE. 




25:24 


Mode— This field contains a copy of PSL<CUR_MOD>. 




23:16 


Opcode — This field contains bits <7:0> of the instruction opcode. The FD bit is 
not included. 




7 


VR — This field contains the "VAX Restart bit, which is used to communicate 
restart information between the microcode and the operating system. If this 
bit is set, no architectural state has been changed by the instruction which was 
executing when the error was detected. If this bit is not set, architectural state 
was modified by the instruction. 



Table 15-5: Machine Check Codes 



Mnemonic 


Code (Hex) 


Meaning 


MCHK_UNKNOWN_MSTATUS 


01 


Unknown memory management fault parameter 
returned by the Mbox (see Section 15.5.2.1) 


MCHK_INT.ID_VALUE 


02 


Illegal interrupt ID value returned in INT. SYS (see 
Section 15.5.2.2) 


MCHK_CANT_GET_HERE 


03 


Illegal microcode dispatch occurred (see 
Section 15.5.2.3) 


MCHK_MOVC.STATUS 


04 


Illegal combination of state bits detected during 
string instruction (see Section 15.5.2.4) 


MCHK_ASYN C_ERROR 


05 


Asynchronous hardware error occurred (see 
Section 15.5.2.5) 


MCHK_SYNC_ERROR 


06 


Synchronous hardware error occurred (see 
Section 15.5.2.6) 



15.5.2 Events Reported Via Machine Check Exceptions 

This section describes all the errors which can cause a machine check exception. A parse tree is 
given which shows how to determine the cause of a given machine check. After that, there is a 
description of each error. For each error, the recovery procedure is given. Where appropriate, the 
conditions for retry are given. See Section 15.3.3 and Section 15.3.4 for more on error recovery 
and error retry. 
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Figure 15—5 is a parse tree which should be used to analyze the cause of a machine check 
exception. The errors shown in the parse tree are described in detail in the sections following 
the figure. The section is indicated in parenthesis with each error. Note that it is assumed 
that the state being analyzed is the saved state, as described in Section 15.3.1. Otherwise the 
state could change during the analysis procedure, leading to possibly incorrect conclusions. (See 
Section 15.3.2 for general information about error analysis.) 
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Figure 15-5: Cause Parse Tree for Machine Check Exceptions 



MACHINE CHECK 
+ (select one) 



MCHK UNKNOWN MSTATUS 



MCHK INT. ID VALUE 



MCHK CANT GET HERE 



MCHK MO VC. STATUS 



MCHK_ASYNC_ERROR 

+ (select all, at least one) 

I 

| S_TBSTS<LOCK> 

+ + (select all) 

I I 

| | S_TBSTS<DPERR> 

I + 

I I 

I I S TBSTS<TPERR> 



none of the above 



I 

I S ECR<S3 STALL TMEOUT> 



none of the above 



MCHK_SYNC_ERROR 
+ (select all, at least one) 



I S_ICSR<LOCK> 

+ + (select all, at least one) 



I I 



I S_ICSR<DPERR> 
+ 

I 

| S ICSR<TPERR> 



I I 

I | none of the above 



-> Unknown memory management status error (Section 15. 5 .2 . 1) 
-> Illegal interrupt ID error (Section 15.5.2.2) 



-> Presumed impossible microcode address reached 
(Section 15.5.2.3) 

-> MOVCx status encoding error (Section 15.5.2.4) 



-> TB PTE data parity error (Section 15.5.2.5.1) 
-> TB tag parity error (Section 15.5.2.5.1) 



■> Inconsistent status (no TBSTS error bits set) 
(Section 15.5.2.7) 



-> S3 stall timeout error (Section 15.5.2 .5.2) 



-> Inconsistent status (no asynchronous machine check error 
set) (Section 15.5.2.7) 



-> VIC (virtual instruction cache) data parity error 
(Section 15.5.2.6.1) 

-> VIC tag parity error (Section 15.5.2. 6.1) 



-> Inconsistent status (no ICSR error bits set) 
(Section 15.5.2.7) 



Figure 15-5 Cont'd on next page 
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Figure 15-5 (Cont): Cause Parse Tree for Machine Check Exceptions 



S_BCEDSTS<LOCK> AND 
NOT S_PCSTS<PTE_ER> 
+ (select one) 

I 

I S_BCEDSTS<BAD_ADDR> 
+ + (select one) 



I I 



S BCEDSTS<DR CMD>-DREAD 



S BCEDSTS<DR CMD>-IREAD 



otherwise 



| S_BCEDSTS<UNCORR> 
+ + (select one) 



S BCEDSTS<DR CMD>-DREAD 



S BCEDSTS<DR CMD>-IREAD 



otherwise 



none of the above 



S_BCEDSTS<LOST_ERR> AND 
NOT S PCSTS<PTE ER> 



S_CEFSTS<LOCK> AND 
NOT S_PCSTS<PTE_ER> 
¥ (select one) 



S_CEFS TS <T IMEOUT > 
+ + (select one) 



otherwise 



I otherwise 
+ 



Bcache data RAM addressing error on D- stream read 
or read-lock (Section 15 . 5 .2 . 6 .2 ) 

Bcache data RAM addressing error on I-stream read 
(Section 15.5.2.6.2) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 



Bcache data RAM uncorrectable ECC error on D-stream read 
or read-lock (Section 15.5.2 .6.2) 

Bcache data RAM uncorrectable ECC error on I-stream read 
(Section 15.5.2.6.2) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 

Inconsistent status (no BCEDSTS unrecoverable error bits 
set) (Section 15.5.2.7) 



Lost unrecoverable Bcache data RAM error 
(Section 15.5.2.6.3) 



| S_CEFSTS<TO_MBOX> AND 

| (NOT S_CEFSTS<REQ_FILL_DONE» 

+ + (select one) 

I I 

I I S CEFSTS< IREAD> 



S CEFSTS<OREAD> 



-> I-stream NDAL read timeout error (Section 15 . 5 .2 . 6 .4) 



D-stream NDAL ownership read timeout error 
(Section 15.5.2. 6.4) 

D-stream NDAL read timeout error (read only operand) 
(Section 15.5.2.6.4) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 
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! S_CEFSTS<RDE> 

+ + (select one) 

I 

I S_CEFS TS< TO_MBOX> AND 
I (NOT S_CEFSTS<REQ_FILL_DONE>) 
+ h (select one) 



S CEFSTS<IREAD> 



S CEFSTS<OREAD> 



+ — 

I 



otherwise 



otherwise 



S CEFSTS-OJNEXPECTED FILL> 



otherwise 



S_CEFSTS<LOST_ERR> AND 
NOT S PCSTS<PTE ER> 



-> I-stream NDAL read data error (Section 15. 5 . 2 . 6. 5) 



-> D-stream NDAL ownership read data error 

(modify operand or read-lock) (Section 15. 5.2. 6. 5) 

-> D-stream NDAL read data error (read only operand) 
(Section 15.5.2.6.5) 

-> Not a synchronous machine check cause (see soft and 
hard error interrupt events) 

-> Not a synchronous machine check cause (see soft error 
interrupt events) 

-> Inconsistent status (either CEFSTS<RDE>, CEFSTS<TIMEOUT>, 
or CEFSTS<UNEXPECTED_FILL> should be set) 
(Section 15.5.2.7) ~ 



-> Lost Bcache fill error (Section 15 . 5 .2 . 6 .6) 



S_NESTS<NOACK> AND 
NOT S_PCSTS<PTE_ER> 



S NEOCMD<CMD >— IREAD 



-> Unacknowledged I-stream NDAL read (Section 15.5.2 .6.7) 



I S_NEOCMD<CMD>-DREAD 
+ 



I S_NEOCMD<CMD>-OREAD 
+ 



S NEOCMD<CMD>- WRITE OR WDISOWN 



I otherwise 



S_NESTS<LOST_OERR> AND 
NOT S PCSTS<PTE ER> 



-> Unacknowledged D-stream NDAL read (read only operand) 
(Section 15.5.2.6.7) 

■> Unacknowledged D-stream NDAL read (modify operand or reac 
(Section 15.5.2.6.7) 

-> Not a synchronous machine check cause (see hard error 
interrupt events) 

-> Inconsistent status (invalid command in NEOCMD<CMD>) 
(Section 15. 5.2.7) 



-> Lost unrecoverable NDAL output error (Section 15 . 5 .2 . 6 . 8) 



Figure 15-5 Cont'd on next page 
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S_BCEDSTS<LOCK> AND 
S_PCSTS<PTE_ER>1 
+ (select one) 



S_BCEDSTS<BAD_ADDR> 
— + (select one) 



S BCEDSTS<DR CMD>-DREAD 



I S_BCEDSTS<DR_CMD>«IREAD 
+ + (select one) 



I S BCEDSTS<LOST ERR> 



otherwise 



I otherwise 

+ + (select one) 



| S_BCEDSTS<LOST_ERR> 
+ 



otherwise 



S_BCEDSTS<UNCORR> 
+ (select one) 



S BCEDSTS<DR CMD>-DREAD 



| S_BCED STS <DR_CMD>- 1 READ 
+ + (select one) 



S BCEDSTS<LOST ERR> 



I otherwise 



I otherwise 

+ + (select one) 



S BCEDSTS<LOST ERR> 



otherwise 



none of the above 



Bcache data RAM addressing error on PTE read 
(Section 15.5.2.6.9.2) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Bcache data RAM error addressing error on 1-stream read 
(Section 15.5.2. 6.2) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 



Bcache data RAM uncorrectable ECC error on PTE read 
(Section 15.5.2.6.9.2) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Bcache data RAM error uncorrectable error on I-stream read 
(Section 15.5.2.6.2) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 

Inconsistent status (no BCEDSTS unrecoverable error bits 
set) (Section 15.5.2.7) 
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At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes 
indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should 
he treated separately. 
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S_CEFSTS<LOCK> AND 
S_PCSTS<PTE_ER>1 
+ (select one) 



S_CEFSTS<TIMEOUT> 
+ (select one) 



S_CEFSTS<TO_MBOX> AND 
(NOT S_CEFSTS<R£Q_FILL_DONE» 

+ (select one) 

I 

I S_CEFSTS<IREAD> 

+ + (select one) 

I I 

I I S CEFSTS<LOST ERR> 



otherwise 



I S_CEFSTS<OREAD> 
+ + (select one) 



S CEFSTS<LOST ERR> 



I otherwise 
+ 



otherwise 



otherwise 
+ + (select one) 



S CEFSTS<LOST ERR> 



otherwise 



-> Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

-> I-stream NDAL read timeout error (Section 15.5.2.6.4) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

D-stream NDAL ownership read timeout error 
(Section 15.5.2.6.4) 

D-stream NDAL read timeout error (PTE read) 
(Section 15.5.2.6.9.3) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 
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At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes 
indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should 
be treated separately. 
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| S_CEFSTS<RDE> 
+ -+ (select one) 



S_CEFSTS<TO_MBOX> AND 
(NOT S_CEFSTS<REQ_FIL1_D0NE>) 
+ (select one) 



I 

I S_CEFSTS<IREAD> 
+ H (select one) 



S CEFSTS<LOST ERK> 



otherwise 



I S_CEFSTS<OREAD> 
■) k (select one) 

I I 

I I S CEFSTS<LOST ERR> 



otherwise 



otherwise 



otherwise 
+ + (select one) 



S CEFSTS<LOST ERR> 



I otherwise 
+ 



S_CEFSTS<UNEXPECTED_FILL> 
+ (select one) 



S CEFSTS<LOST ERR> 



otherwise 



otherwise 



-> Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

-> I- stream NDA1 read data error (Section 15 . 5 .2 . 6 . 5) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

D- st ream NDAL ownership read data error 
(Section 15.5.2. 6.5) 

D-stream NDAL read timeout error (PTE read) 
(Section 15.5.2.6.9.4) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Not a synchronous machine check cause (see soft and 
hard error interrupt events) 



Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

Not a synchronous machine check cause (see hard error 
interrupt events) 

Inconsistent status (either CEFSTS<RDE>, CEFSTS<TIMEOUT>, 
or CEFSTS<UNEXPECTED_FIL1> should be set) 
(Section 15.5.2.7) 
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At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes 
indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should 
be treated separately. 
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Figure 15-5 (Cont): Cause Parse Tree for Machine Check Exceptions 



S_NESTS<NOACK> AND 
S_PCSTS<PTE_ER>1 



S_NEOCMD<CMD>— IREAD 
+ + (select one) 



S NESTS<LOST OERR> 



otherwise 



S NEOCMD<CMD>— DREAD 



S_NEOCMD<CMD>-OREAD 
+ (select one) 



| S NESTS<LOST OERR> 



otherwise 



| S_NESTS<LOST_OERR> 
+ 



I otherwise 



otherwise 



none of the above 



otherwise 



Notation: 

(select one) 



-> Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

-> Unacknowledged I-stream NDAL read (Section 15.5.2.6.7) 



-> Unacknowledged D-stream NDAL read (PTE read) 
(Section 15.5.2.6.9.5) 



S_NEOCMD<CMD>- WRITE OR WDISOWN 
— + (select one) 



-> Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

-> Unacknowledged D-stream NDAL read (modify operand or rea< 
(Section 15.5.2.6.7) 



-> Multiple errors in context of PTE read error 
(Section 15.5.2.6.9.6) 

-> Not a synchronous machine check cause (see hard error 
interrupt events) 

-> Inconsistent status (invalid command in NEOCMD<CMD>) 
(Section 15.5.2.7) 

-> Inconsistent status (no cause found for synchronous mach: 
(Section 15.5.2.7) 

-> Inconsistent status (unknown machine check code) 
(Section 15.5.2.7) 



- Exactly one case must be true. If zero or more than one is 
true, the status is inconsistent, 
(select all) - More than one case may be true. 

(select all, at least one) - All the cases are possible causes of a particular machine check. 

More than one may be true. At least one must be true or the status 

is inconsistent. A case is not considered true if it evaluates to 

"Not a machine check cause", 
otherwise - fall-through case for (select one) if no other case is true, 

none of the above - fall-through case for (select all) or (select all, at least one) 

if no other case is true. 



NOTE 

References to VR and PSL<FPD> in the "retry condition" parts of the following 
descriptions of machine check causes should be understood to refer to the named bit 
in the machine check stack frame. 



At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes 
indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should 
he treated separately. 
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15.5.2.1 MCHK_UNKNOWN_MSTATUS 

Description: An unknown memory management status was returned from the Mbox in response 
to a microcode memory management probe. This is probably due to an internal error in the Mbox, 
Ebox, or microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: This error can only happen in microcode processing of memory management 
faults for a virtual memory reference. Retry if: 



(VR = 1) OR (PSL<FPD> = 1). 



15.5.2.2 MCHK INT.ID VALUE 



Description: An illegal interrupt ID was returned in INT.SYS during interrupt processing 
in microcode. This is probably due to an internal error in the interrupt hardware, Ebox, or 
microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: This error can only happen in microcode processing of interrupts which occurs 
between instructions or the middle of interruptable instructions. Retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

1 5.5.2.3 MCH K_C ANTG ETH ERE 

Description: Microcode execution reached a presumably impossible address. This is probably 
due to a microcode bug or an internal error in the Ebox or microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: Retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

1 5.5.2.4 MCHK_MOVC.STATUS 

Description: During the execution of MOVCx, the two state bits that encode the state of the 
move (forward, backward, fill) were found set to the fourth (illegal) combination. This is probably 
due to an internal error in the Ebox or microsequencer. 

Recovery procedures: No explicit error recovery is required in response to this error. 

Retry condition: Because the state bits encode the operation, the instruction can not be 
restarted in the middle of the MOVCx. If software can determine that no specifiers have been 
over-written (MOVCx destroys R0-R5 and memory due to string writes), the instruction may be 
restarted from the beginning by clearing PSL<FPD>. This should be done only if the source and 
destination strings do not overlap and if: 

(PSL<FPD> = 1). 
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1 5.5.2.5 MCHK_ASYNC_ERROR 

This machine check code reports serious errors which interrupt the microcode at an arbitrary 
point. Many internal machine states (e.g., bits in the PSL, the PC or SP) are questionable. 
Recovery is typically not possible. 

15.5.2.5.1 TB Parity Errors 

Description: Parity errors in tags and PTE data in the TB cause an asynchronous machine 
check by directly forcing a microtrap in the microsequencer. The reference being processed by 
the Mbox may be for and explicit Ebox reference, an operand prefetch or DEST_ADDR reference 
from the specifier queue, or an instruction prefetch from the IREF latch. Also the reference could 
be a read generated by the Mbox within a TB miss for a process space virtual address since 
process page tables are stored in virtual memory (system space). 

Description (TB PTE Data Parity Error): A parity error in the PTE data portion of a TB 
entry which hit had a parity error. 

Description (TB Tag Parity Error): A parity error in the tag portion of a TB entry which hit 
had a parity error. 

Recovery procedures: To recover, clear TBSTS<LOCK>. 

Retry condition: Since the Ibox is nearly always able to issue instruction prefetches, TB parity 
errors could occur at practically any time. This makes it impossible to determine what machine 
state is incorrect. There is no guarantee that all writes with a different PSL<CUR_MOD> 
completed successfully. Therefore even the stack frame PSL<CUR_MOD> can't be used to 
determine whether system data is uncorrupted. 

So retry is not possible. Crash the system. 

15.5.2.5.2 Ebox S3 Stall Timeout Error 

Description: S3 stall timeout errors occur when the Ebox microcode is stalled waiting for some 
result or action which will probably never occur. S4 stalls in the Ebox cause S3 stalls and therefore 
can lead to S3 stall timeout. Additionally, field queue stall and instruction queue stall can cause 
this timeout. (These last two situations are not Ebox pipeline stalls, but they are similar in 
effect.) The timeout can occur in any microflow for a number of reasons. Machine state may be 
corrupted. This timeout is probably due to an internal error in the NVAX CPU such that one 
box is waiting for another to do something which it isn't going to do. An example would be if the 
Ebox microcode expected one more source specifier than the Ibox delivered. The Ebox will stall 
until the timeout occurs waiting for the Ibox to deliver one more source operand via the source 
queue. 

S3 timeout errors can be caused by failures of various pipeline control circuits in the Ebox. Also 
a deadlock within a box or across multiple boxes can cause this error. 

Recovery procedures: To recover, clear the S3_STALL_TIMEOUT bit in ECR. 

Retry condition: Because this error can occur at any time, it is not possible to determine what 
machine state is incorrect. Also, this error should never happen and indicates either a serious 
failure in the NVAX CPU chip or a design bug. So retry is not possible. Crash the system. 
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1 5.5.2.6 MCHK_SYNC_ERROR 

This machine check code reports errors which occur in memory or 10 space instruction fetches or 
data reads. Except in the case of PTE read errors, core machine state should be consistent since 
microcode has to explicitly access an operand or instruction in order incur this error. Microcode 
does not access memory results or dispatch for a new instruction execution with core machine 
state in an inconsistent state. 

PTE read errors on write transactions can cause a microtrap at an arbitrary time, and so core 
machine state may be inconsistent. 

Many of the error events described below for synchronous machine check are possible causes. If 
more than one is present, there is no way to determine which actually caused the machine check. 
If exactly one possible cause is discovered, then the machine check may be attributed to that 
cause. The reason multiple causes may be present is that the NVAX CPU prefetches instructions 
and data. If the CPU branches or takes an exception before using data it has requested, then 
the pending machine check is taken as a soft error interrupt (though it might not be recoverable 
in the final analysis). 

If multiple errors occur, recovery and retry may be possible. It is recommended that retry from 
multiple errors be done only if one error report does not interfere with analysis of, and recovery 
from, another error. 

An example of such interference is when S_BCEDSTS reports a Bcache data RAM uncorrectable 
error on a writeback while S_NESTS is reporting a NDAL command no-ACK error. Normally, 
S_NESTS<BADWDATA> would be reported by the writeback error and S_NEOADR would report 
the address of the lost writeback. The no-ACK error makes recovery from the writeback error 
much more difficult. But there it is unlikely that these two errors would occur together since 
they are understood to be uncorrelated events. So this case is considered unrecoverable. 

If two errors are entirely separate, neither interfering with the analysis and recovery of the 
other, then it is acceptable to retry from these errors provided all the error analyses and recovery 
procedures result in a retry indacatim,. 

In several cases, lost errors are tolerated. See Section 15.3.4.2 for a list of these special cases. 
In each case, the strong tendency to prefetch data exhibited by the NVAX pipeline makes the 
particular lost error likely, given that one error of that kind occurred. Also, in each case, if data 
is lost in the lost error, a hard error interrupt is posted. So these errors are tolerated as long as 
they do not cause a hard error interrupt. 

Errors in opcode or operand specifier fetching are always detected before architecturally visible 
state within the CPU is modified. This means the VR bit from the machine check stack frame 
should be 1. This error handling analysis attempts to recover from multiple errors, so the retry 
condition for each error is made as general as possible. If the machine check handler finds only 
errors of the kind listed here, then VR should be 1 and it is an inconsistent report if it is not (see 
Section 15.5.2.7). 

• VIC parity errors. 

• Bcache data RAM uncorrectable ECC and addressing errors in I-stream reads. 

• Bcache timeout errors and fill read data errors in I-stream reads. 

• Unacknowledged NDAL I-stream reads 
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15.5.2.6.1 VIC Parity Errors 

Description: A parity error was detected in the VIC tag or data store in the Ibox. VIC parity 
errors cause a machine check when the Ebox microcode requests dispatch to a new instruction 
execution microflow or attempts to access an operand within an instruction execution microflow. 

VIC Data Parity Errors: A parity error occurred in the data portion of the VIC. 

VIC lag Parity Errors: A parity error occurred in the tag portion of the VIC. 

In all cases, the quadword virtual address of the error is in VMAR. 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures: lb recover, disable and flush the VIC by re-writing all the tags (using 
the procedure in Section 15.3.3.1.1.1). Also, clear ICSR<LOCK>. 

Retry condition: Retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

15.5.2.6.2 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors 

Description (addressing errors): A Bcache addressing error was detected by the Cbox in an 
I-stream or D-stream read during a Bcache hit. Addressing errors are the result of a mismatch 
between the address the Cbox drives to the RAMs for a read access and the address used to write 
that location. A multiple bit data error can appear to be addressing error, though it is extremely 
unlikely. 

Description (uncorrectable ECC errors): A Bcache uncorrectable data error was detected by 
the Cbox in an I-stream or D-stream read during a Bcache hit. Uncorrectable data errors are the 
result of a multiple bit error in the data read from the Bcache. An addressing error with a single 
bit data error will appear as an uncorrectable data error. 

Description (all cases): The Bcache is in ETM. P JBCEDIDX contains the cache index of the 
error, and S_BCEDECC contains the syndrome calculated by the ECC logic. 

The physical address of the reference can be found by reading the tag for the data block (using 
the procedure in Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an 
inconsistent status. See Section 15.5.2.7.) If the block's tag is found to contain an uncorrectable 
ECC error, then the address can not be determined. 

It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> 
are set. If they are, it is an inconsistent status (see Section 15.5.2.7). 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures (addressing errors): lb recover, clear BCEDSTS<LOCK, BAD_ADDR>. 

Recovery procedures (uncorrectable ECC errors): lb recover, clear BCEDSTS<LOCK, 
UNCORR>. 

Recovery procedures (both cases): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing 
the Bcache). If the data is owned by the Bcache and if the error repeats itself (is not transient), 
then a writeback error will result from the flush procedure. Software should prepare for this by 
clearing NESTS and BCEDSTS errors. 
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Retry condition: If no writeback error occurs in the Bcache flush, retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See 
Section 15.8.1.10 for a description of handling an error in a writeback. Given that the address is 
available (no error in the tag store), software should determine if the error is fatal to one process 
or the whole system and take appropriate action. Otherwise, crash the system. 

15.5.2.6.3 Bcache Lost Data RAM Access Error 

Description: A lost Bcache data BAM error may have been a machine check cause. It also 
might not have been. Lost Bcache data RAM errors which cause machine checks are always read 
errors, and can be retried unless the aborted instruction has altered essential state. Whether or 
not it is a machine check cause, the error will have caused either a soft or hard error interrupt. 
Lost Bcache data RAM errors which can not have caused a machine check are dealt with in the 
sections on hard and soft error interrupts. 

Lost Bcache data RAM errors may be caused by more than one operand prefetch to the same 
cache block. 

Recovery for lost Bcache data RAM errors depends on whether the pending interrupt is a hard or 
soft error interrupt. The machine check error handling software should defer recovery until the 
expected hard or soft error interrupt occurs. Once the interrupt is taken, the error recovery and 
restart instructions found in the hard error interrupt and soft error interrupt sections should be 
referenced. See Section 15.7.1.3.2 and Section 15.8.1.15. 

Software should employ some mechanism to record that an interrupt for a lost Bcache data RAM 
error is pending. This mechanism should allow detection of a case in which an expected interrupt 
does not occur (once IPL is lowered). If the expected interrupt does not occur when IPL is lowered, 
then a serious inconsistency exists and the system should be crashed. 

The Bcache in in ETM. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 

Recovery procedures: No specific recovery action is required. 

Note that BCEDSTS<LOST_ERR> is not cleared. It will be cleared by the hard or soft error 
interrupt handler. Also, the Bcache must remain in ETM until the error interrupt occurs. 

Retry condition: Retry only if: 

(VR = 1) OR (PSL<FPD> = 1). 

15.5.2.6.4 NDAL l-Stream or D-Stream Read or D-Stream Ownership Read Timeout Errors 

Description: An I-stream or D-stream read or D-stream ownership read timed out before any 
fill quadword was received. This is not an accepted means for a system environment to notify the 
NVAX CPU of "non-existent memory or 10 location". The error could be caused by an error in the 
system environment or an NDAL parity error on the returned data. It also could be caused by 
some previous error in the system environment or this CPU which leaves a cache block marked 
as owned in memory and not marked as owned in any cache in the system. 
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S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S__CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical 
address is in S_CEFADR. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). 
I-stream read: The Bcache is not in ETM, 

I -stream errors cause a machine check when the Ebox microcode requests dispatch to a new 
instruction execution microflow or attempts to access an operand within an instruction execution 
microflow where the I-stream data with the error is required for the dispatch or access. 

D-stream read: The Bcache is not in ETM. 

D- stream read errors cause a machine check when the Ebox microcode accesses prefetched 
operand data or when the Mbox returns data tagged with an error indication to the Ebox register 
file. 

D-stream ownership read: The Bcache is in ETM. No write data has been merged with the 
returning fills. 

The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.5.2.7). 

D-stream ownership read errors cause a machine check when the Ebox microcode accesses 
prefetched operand data or when the Ebox issues a read-lock. 

Pending Interrupts (all cases): A soft error interrupt should be pending. 

Recovery procedures (all cases): Clear CEFSTS<LOCK,TIMEOUT>. 

Additional Recovery procedures for D-stream ownership read: Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). 

Depending on the system environment, memory may have set its ownership bit for this block. 
The data in memory is presumably still good. The Bcache block is marked invalid in the Bcache 
tag store. 

If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. In general, 
it is not possible to determine which quadwords are valid. However, if the S_CEFSTS<COUNT> 
is 11 (binary) and S_CEFSTS<REQ_FILL__DONE> is not set, then the three quadwords in the 
Bcache block other than the quadword pointed to by S_CEFADR are valid. 

If S_CEFSTS<COUNT> is greater than 0, and the address in S.CEFADR is not in 10 space, 
then the block was not owned before the operation began. In this case, use the procedures in 
Section 15.3.3.1.2.2 to determine if memory's ownership bit is set and this CPU owns the block. 
If so, use the system specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In some systems 
(the XMI2 for example) this may require a quadword of correct data be written to memory to 
reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data from the 
Bcache data RAMs in this case. 

If memory's ownership bit was left set as a result of this error and no non-destructive procedure 
exists for restoring it, then the hexaword block is lost. 

Retry condition (I-stream or D-stream read): Retry if the address is not in 10 space and: 

(VR = 1) OR (PSL<FPD> = 1). 
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Retry condition (D-stream ownership read): Given that no data is lost, retry if the memory 
state repair procedure is successful or not called for and if: 

(VR = 1) OR (PSL<FPD> = 1). 

If the hexaword block could not be repaired or data is lost, software must determine if the error 
is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one 
process, use the system dependent procedure for reseting memory's ownership bit.) 

Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost". 1 
Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 

NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then retry once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the retry actually 
occurs, fortuitously repairing the cause of the fill error. 

15.5.2.6.5 NDAL l-Stream or D-Stream Read or D-Stream Ownership Read Data Errors 

Description: An I-stream or D-stream read or D-stream ownership read ended with 
an RDE (read data error) NDAL cycle before any the fill quadwords were received. If 
S_CEFSTS<COUNT> is 0 or the address in S_CEFADR is an 10 space address, this is an accepted 
means for a system environment to notify the NVAX CPU of "non-existent memory or IO location" . 
Otherwise, the error could be caused by an error in the system environment. It also could be 
caused by some previous error in the system environment or this CPU which leaves a cache block 
marked as owned in memory and not marked as owned in any cache in the system. 

S__CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical 
address is in S_CEFADR. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). 
I-stream read: The Bcache is not in ETM. 

I-stream errors cause a machine check when the Ebox microcode requests dispatch to a new 
instruction execution microflow or attempts to access an operand within an instruction execution 
microflow where the I-stream data with the error is required for the dispatch or access. 

D-stream read: The Bcache is not in ETM. 

D-stream read errors cause a machine check when the Ebox microcode accesses prefetched 
operand data or when the Mbox returns data tagged with an error indication to the Ebox register 
file. 



In this case the more general sense of 'lost" is implied. That is, memory's ownership hit is set hut no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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D-stream ownership read: The Bcache is in ETM. No write data has been merged with the 
returning fills. 

The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.5.2.7). 

D-stream ownership read errors cause a machine check when the Ebox microcode accesses 
prefetched operand data or when the Ebox issues a read-lock. 

Pending Interrupts (all cases): A soft error interrupt should be pending. 

Recovery procedures (all cases): Clear CEFSTS<LOCK,RDE>. 

Additional Recovery procedures for D-stream ownership read: Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). 

Depending on the system environment, memory may have set its ownership bit for this block. 
The data in memory could still be good. The Bcache block is marked invalid in the Bcache tag 
store. 

If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. In general, 
it is not possible to determine which quadwords are valid. However, if the S_CEFSTS<COUNT> 
is 11 (binary) and S_CEFSTS<REQ_FILL_DONE> is not set, then the three quadwords in the 
Bcache block other than the quadword pointed to by S_CEFADR are valid. 

If S_CEFSTS<COUNT> is greater than 0, and the address in S.CEFADR is not in IO space, 
then the block was not owned before the operation began. In this case, use the procedures in 
Section 15.3.3.1.2.2 to determine if memory's ownership bit is set and this CPU owns the block. 
If so, use the system specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In some systems 
(the XMI2 for example) this may require a quadword of correct data be written to memory to 
reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data from the 
Bcache data RAMs in this case. 

If memory's ownership bit was left set as a result of this error and no non-destructive procedure 
exists for restoring it, then the hexa-^ord block is lost. 

Retry condition (I-stream or D-stream read): Retry if the address is not in IO space and: 

(VR = 1) OR (PSL<FPD> = 1). 

Retry condition (D-stream ownership read): Given that no data is lost, retry if the memoiy 
state repair procedure is successful or not called for and if: 

(VR = 1) OR (PSL<FPD> = 1). 

If the hexaword block could not be repaired or data is lost, software must determine if the error 
is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one 
process, use the system dependent procedure for reseting memory's ownership bit.) 

Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost".- 1 
Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure foi 
reseting memory's ownership bit.) 



1 In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes th< 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memor 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then retry once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the retry actually 
occurs, fortuitously repairing the cause of the fill error. 

15.5.2.6.6 Lost Bcache Fill Error 

Description: Some number of fill errors occurred and were not latched because CEFSTS and 
CEFADR already contained a report of an unrecoverable error. There is no guarantee this error 
could have caused a machine check, though it may be a cause. Lost Bcache fill errors which 
cause machine checks are always read errors, and can be retried unless the aborted instruction 
has altered essential state. If it is a machine check cause, the error will have caused a a soft 
error interrupt. Lost Bcache fill errors which can not have caused a machine check are dealt with 
in the sections on hard and soft error interrupts. 

Lost Bcache fill errors may be caused by more than one operand prefetch to the same cache block. 

Recovery for lost Bcache fill errors depends on whether the pending interrupt is a hard or soft 
error interrupt. The machine check error handling software should defer recovery until the 
expected hard or soft error interrupt occurs. Once the interrupt is taken, the error recovery and 
restart instructions found in the hard error interrupt and soft error interrupt sections should be 
referenced. See Section 15.7.1.3.2 and Section 15.8.1.15. 

Software should employ some mechanism to record that an interrupt for a lost Bcache fill error 
is pending. This mechanism should allow detection of a case in which an expected interrupt does 
not occur (once IPL is lowered). If the expected interrupt does not occur when IPL is lowered, 
then a serious inconsistency exists and the system should be crashed. 

The Bcache may be in ETM (S_CCTL<HW_ETM> will be set if it is). 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 

Recovery procedures: No specific recovery action is required. Note that CEFSTS<LOST_ERR> 
is not cleared. It will be cleared by the hard or soft error interrupt handler. Also, the Bcache 
must remain in ETM (if it is already) until the error interrupt occurs. 

Retry condition: Retry only if: 

(VR = 1) OR (PSL<FPD> = 1). 

15.5.2.6.7 Unacknowledged NDAL l-Stream or D-Stream Read or D-Stream Ownership Read 

Description: An I-stream or D-stream read or D- stream ownership read was no-ACKed by the 
system environment. This could be because the external components) received bad NDAL parity 
or it could be due to a system-specific notification of "non-existent memory or 10 location* 1 . The 
physical address is in S_NE0ADR. 

I-stream read: The Bcache is not in ETM. 
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I-stream errors cause a machine check when the Ebox microcode requests dispatch to a new 
instruction execution microflow or attempts to access an operand within an instruction execution 
microflow where the I-stream data with the error is required for the dispatch or access. 

D-stream read: The B cache is not in ETM. 

D-stream read errors cause a machine check when the Ebox microcode accesses prefetched 
operand data or when the Mbox returns data tagged with an error indication to the Ebox register 
file. 

D-stream ownership read: The B cache is in ETM. 

The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.5.2.7). 

D-stream ownership read errors cause a machine check when the Ebox microcode accesses 
prefetched operand data. 

Pending Interrupts (all cases): A soft error interrupt should be pending. 
Recovery procedures (all cases): Clear NESTS<N OACK> . 

Additional Recovery procedure for D-stream ownership read: Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). 

Retry condition: Retry if: 

(VR = 1) OR (PSL<FPD> = 1). 

15.5.2.6.8 Lost NDAL Output Error 

Description: Some number of NDAL output errors occurred and were not latched because 
NESTS, NEOADR, NEDATHI, and NEDATLO already contained a report of an unrecoverable 
error. There is no guarantee this error could have caused a machine check, though it may be a 
cause. Lost NDAL output errors which cause machine checks are always read errors, and can be 
retried unless the aborted ins action has altered essential state. If it is a machine check cause, 
the error will have caused a a soft error interrupt. Lost NDAL output errors which can not have 
caused a machine check are dealt with in the sections on hard and soft error interrupts. 

Recovery for lost NDAL output errors depends on whether the pending interrupt is a hard or 
soft error interrupt. The machine check error handling software should defer recovery until the 
expected hard or soft error interrupt occurs. Once the interrupt is taken, the error recovery and 
restart instructions found, in the hard error interrupt and soft error interrupt sections should be 
referenced. See Section 15.7.1.5 and Section 15.8.1.17. 

Software should employ some mechanism to record that an interrupt for a lost NDAL output error 
is pending. This mechanism should allow detection of a case in which an expected interrupt does 
not occur (once IPL is lowered). If the expected interrupt does not occur once IPL is lowered, 
then a serious inconsistency exists and the system should be crashed. 

The Bcache may be in ETM (S_CCTL<HW_ETM> will be set if it is). 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 

Recovery procedures: No specific recovery action is required. Note that NESTS<LOST_ERR> 
is not cleared. It will be cleared by the hard or soft error interrupt handler. Also, the Bcache 
must remain in ETM (if it is already) until the error interrupt occurs. 
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Retry condition: Retry only if: 

(VR = 1) OR (PSL<FPD> = 1). 

15.5.2.6.9 PTE read errors 

The following sections describe error handling for PTE read errors. PTE read errors are read 
errors which happen in reads issued by the Mbox in handling a TB miss. Handling of these errors 
is different from handling the same underlying error (Bcache data RAM error, Bcache fill error, 
or NDAL no-ACK error) when PTE read isn't the cause. 

If S__PCSTS<PTEJER> is set, then a PTE read issued by the Mbox in processing a TB miss had 
an unrecoverable error. The TB miss sequence was aborted because of the error. The original 
reference can be any I-stream or D-stream read or write. If the original reference was issued by 
the Ebox, then the PTE read which incurred the error will have been retried once (because of a 
special hardware/microcode mechanism for handling PTE read errors on Ebox references). 

PTE read errors are difficult to analyze, partly because the read error report in the Cbox does 
not directly indicate that the failing read was a PTE read. Because of this and because PTE read 
errors should be rare (a very small percentage of the reads issued by the Mbox are PTE reads), 
multiple errors which interfere with the analysis of the PTE error are not considered recoverable. 

The mechanism for reporting PTE read errors on Ebox references involves the Mbox forcing the 
Ebox (via a microtrap) into the microcode routine which normally handles memory management 
faults. This routine probes the address of the original reference, effectively retrying the failing 
PTE read. Assuming the error is not transient, the probe by microcode will cause a machine check. 
If the error does not occur on the probe, microcode restarts the current instruction stream. So 
machine checks caused by PTE read errors can easily occur with the particular PTE read error 
having occurred twice (with a lost error bit set in the relevant Cbox error register). The analysis 
here tolerates these particular multiple error reports and allows retry in those cases, provided 
the remainder of the error analysis indicates retry is appropriate. (Note that there is no way to 
tell from the information available to the machine check handler whether the original reference 
was an Ebox or Ibox reference.) 

If the reference which incurs the PTE read error is a write, S_PCSTS<PTE_ER_WR> will be set. 
In this case the original write is lost. No retry is possible partly because the instruction which 
took the machine check may be subsequent to the one which issued the failing write. Also, PTE 
read errors on write transactions can cause a machine check at a practically arbitrary time in a 
microcode flow, and core machine state may not be consistent. 

15.5.2.6.9.1 PTE Read Errors in Interruptable Instructions 

Another special case associated with PTE read errors exists for interruptable instructions 
(specifically CMPC3, CMPC5, LOCC, MOVC3, MOVC5, SCANC, SKPC, and SPANC). For these 
instructions, if the PTE read error occurred for an Ebox reference, the PC in the machine 
check stack frame points to the instruction following the interrupted instruction. In this 
case, the SAVEPC element in the machine check stack frame is the PC of the interrupted 
instruction. However in all other cases, SAVEPC is UNPREDICTABLE. This case is not 
considered recoverable because analysis of the error information can not unambiguously conclude 
that this case is present. To tell that this case might be present, the error handler examines the 
FPD bit in the PSL in the machine check stack frame. If FPD is set in the stack frame (in the 
case of a PTE read error) then one of the following is true: 
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• One of the intemiptable instructions listed above incurred the PTE read error. In this case, 
SAVEPC from the machine check stack frame points to the interrupted instruction, and PC 
in the stack frame points to the next instruction. 

• An REI instruction loaded a PSL with FPD set and a certain PC. The Ibox incurred the PTE 
read error in fetching the opcode pointed to by that PC. In this case, the PC in the stack 
frame points to the instruction which was the target of the REI and SAVEPC from the stack 
frame is unpredictable. 

It is not possible to determine with certainty which of the two above cases is the cause of a machine 
check with S_PCSTS<PTEJER> set and stack frame PSL<FPD> set. Retry is not possible since 
software can not tell which PC to restart with. However, software may wish to probe the location 
pointed to by the PC in the stack frame, expecting a possible machine check as a result. If 
a machine check does occur, that is information indicating that the second case occurred (not 
totally unambiguously, of course). A very good guess may be made by a person examining the 
error report if the machine check stack frame and the result of this probe is available in the 
report. 

15.5.2.6.9.2 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on PTE Reads 

Description (addressing errors): A Bcache addressing error was detected by the Cbox in a PTE 
read during a Bcache hit. Addressing errors are the result of a mismatch between the address 
the Cbox drives to the RAMs for a read access and the address used to write that location. A 
multiple bit data error can appear to be addressing error, though it is extremely unlikely. 

Description (uncorrectable ECC errors): A Bcache uncorrectable data error was detected 
by the Cbox in a PTE read during a Bcache hit. Uncorrectable data errors are the result of a 
multiple bit error in the data read from the Bcache. An addressing error with a single bit data 
error will appear as an uncorrectable data error. 

Description (all cases): The Bcache in in ETM. S_BCEDIDX contains the cache index of the 
error, and BCEDECC i. itains the syndrome calculated by the ECC logic. The physical address 
of the PTE read can be found by reading the tag for the data block (using the procedure in 
Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent 
status. See Section 15.5.2.7.) 

If the block's tag is found to contain an ECC error, then the address can not be determined. 

S_BCEDSTS<LOST_ERR> may be set. This error is probably due to the same PTE error 
occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs 
after handling this error. 

It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> 
are set. If they are, it is an inconsistent status (see Section 15.5.2.7). 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures (addressing errors): To recover, clear BCEDSTS<LOCK, BAD_ADDR>. 

Recovery procedures (uncorrectable ECC errors): To recover, clear BCEDSTS<LOCK, 
UNCORR>. 
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Recovery procedures (both cases): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing 
the Bcache). Clear PCSTS<PTE_ER>. If the data is owned by the Bcache and if the error repeats 
itself (is not transient), then a writeback error will result from the flush procedure. Software 
should prepare for this by clearing NESTS and BCEDSTS errors. 

Retry condition: If no writeback error occurs in the Bcache flush, retry if: 

(VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). 

If 

(PSL<FPD> = 1) OR (S_PCSTS<PTE_ER_WR> = 1), 

crash the system. 

If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See 
Section 15.8.1.10 for a description of handling an error in a writeback (software must determine 
if the error is fatal to one process or the whole system and take appropriate action). 

15.5.2.6.9.3 NDAL PTE Read Timeout Errors 

Description: A PTE read timed out before any fill quadword was received. This is not an 
accepted means for a system environment to notify the NVAX CPU of "non-existent memory or 
10 location". The error could be caused by an error in the system environment or an NDAL 
parity error on the returned data. It also could be caused by some previous error in the system 
environment or this CPU which leaves a cache block marked as owned in memory and not marked 
as owned in any cache in the system. 

S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical 
address is in S.CEFADR. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). 

The Bcache is not in ETM. The read was not an ownership read, so this error can not have caused 
the ownership bits in memory to be left in the wrong state. 

S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures: Clear CEFSTS<LOCK, TIMEOUT>. Clear PCSTS<PTE_ER>. 
Retry condition: Retry if: 

(VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTEJER_WR> = 0). 
Otherwise, crash the system. 

Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost". 1 
Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 



1 In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then retry once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the retry actually 
occurs, fortuitously repairing the cause of the fill error. 



15.5.2.6.9.4 NDAL PTE Read Data Errors 

Description: A PTE read ended with an RDE (read data error) NDAL cycle before any the fill 
quadwords were received. If S_CEFSTS<COUNT> is 0 or the address in S_CEFADR is an 10 
space address, this is an accepted means for a system environment to notify the NVAX CPU of 
"non-existent memory or 10 location". Otherwise, the error could be caused by an error in the 
system environment. It also could be caused by some previous error in the system environment 
or this CPU which leaves a cache block marked as owned in memory and not marked as owned 
in any cache in the system. 

S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in IO space.) The physical 
address is in S_CEFADR. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). 

The physical address of the PTE is in S.CEFADR. The Bcache is not in ETM. The read could not 
have been an ownership read, so this error can not have caused the ownership bits in memory to 
be left in the wrong state. 

S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Pending Interrupts: A soft error interrupt should be pending. 

Recovery procedures: Clear CEFSTS<LOCK, RDE>. Clear PCSTS<PTE_ER>. 

Retry condition: Retry if: 

(VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). 
Otherwise, crash the system. 

Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost". 1 
Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 



In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then retry once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the retry actually 
occurs, fortuitously repairing the cause of the fill error. 

15.5.2.6.9.5 Unacknowledged NDAL PTE Read 

Description: A PTE read was no-ACKed by the system environment. This could be because the 
external component(s) received bad NDAL parity or it could be due to a system-specific notification 
of "non-existent memory or 10 location". 

The physical address of the PTE is in S.NEOADR. The Bcache is not in ETM. 

S_CEFSTS<LOST_OERR> maybe set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Pending Interrupts: A soft error interrupt should be pending. 
Recovery procedures: Clear NESTS<NOACK>. Clear PCSTS<PTE_ER>. 
Retry condition: Retry if: 

(VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). 
Otherwise, crash the system. 

15.5.2.6.9.6 Multiple Errors Which interfere with Analysis of PTE Read Error 

Because PTE read errors lead to several unusual cases, retry is not recommended in the event 
that other errors cloud the analysis of the PTE read error. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 
Recovery procedures: No specific recovery action is called for. 
Retry condition: No retry is possible. Crash the system. 

15.5.2.7 Inconsistent Status in Machine Check Cause Analysis 

Description: A presumed impossible error report was found in the error registers. This could 
be due to a hardware failure or bug, or to incomplete analysis in this spec. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 

Recovery procedures: No specific recovery action is called for. 

Retry condition: No retry is possible. The integrity of the entire system is questionable. Crash 
the system. 
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15.6 Power Fail Interrupt 

Power foil interrupts are requested to report imminent loss of power to the CPU. Power fail 
interrupts are requested via the PWKFLJL pin at IPL IE (hex) and are dispatched to the operating 
system through SCB vector OC (hex). 

The stack frame for a power fail interrupt is shown in Figure 15-6. 
Figure 15-6: Power Fail interrupt Stack Frame 

31 30 29 28127 26 25 24 |23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 



+■ 



■+ 



PC 



I : <SP) 



+■ 



•+ 



PSL 



+■ 



■+ 
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15.7 Hard Error Interrupts 



Hard error interrupts are requested to report an error that was detected asynchronously with 
respect to instruction execution. This results in an interrupt at IPL ID (hex) to be dispatched 
through SCB vector 60 (hex). Typically, these error indicate that machine state has been corrupted 
and that retry is not possible. 

The stack frame for a hard error interrupt is shown in Figure 15—7. 



Figure 15-7: Hard Error Interrupt Stack Frame 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 | 11 10 09 08 | 07 06 05 04|03 02 01 00 
I- — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I PC I : <SP) 

t- + + + — + + + — + + — + + + + + + + + + + + + + — + + + + + H + + + + + 

I PSL I 



15.7.1 Events Reported Via Hard Error Interrupts 

This section describes all the errors which can cause a hard error interrupt. A parse tree is given 
which shows how to determine the cause of a given hard error. After that, there is a description 
of each error. For each error, the recovery procedure is given. Where appropriate, the conditions 
for restart are given. See Section 15.3.3 and Section 15.3.4 for more on error recovery and error 
retry. 

Figure 15—8 is a parse tree which should be used to analyze the cause of a hard error interrupt. 
It is assumed that the state being analyzed is the saved state, as described in Section 15.3.1. 
Otherwise the state could change during the analysis procedure, leading to possibly incorrect 
conclusions. (See Section 15.3.2 for general information about error analysis.) 
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Figure 15-8: Cause Parse Tree for Hard Error Interrupts 



HARD ERROR INTERRUPT 

h (select all, at- least one) 



S_BCEDSTS<LOCK> 
H h (select one) 



S_BCEDSTS<BAD_ADDR> 



S BCEDSTS<DR CMD>-RMW 



otherwise 



£_BCEDS TS<UNCORR> 



S BCEDSTS<DR CMD>-RMW 



otherwise 



none of the above 



S BCEDSTS<LOST ERR> 



£_CEFSTS<LOCK> 
+ 1- (select one) 



-> Bcache data RAM addressing error on a write or write-unl< 
from Mbox (Section 15.7.1.1) 

-> Not a hard error interrupt cause (see soft error interruj 
events) 



— > Bcache data RAM uncorrectable ECC error on a write or wrj 
unlock from Mbox (Section 15.7.1.1) 

— > Not a hard error interrupt cause (see soft error interruj 
events) 

— > Inconsistent status (no BCEDSTS unrecoverable error bits 
set ) (Section 15.7.1.7) 



-> Lost unrecoverable Bcache data RAM error 
(Section 15.7.1.2) 



| S_CEFSTS<TIMEOUT> AND S_CEFSTS<REQ_FILL_DONE> 
I AND S_CEFSTS<WRITE> AND S_CEFSTS<OREAD> 

+ > NDA1 timeout on OREAD for write from Mbox after write dat 

i merged with fill data in cache (Section 15.7.1.3) 

I S_CEFSTS<RDE> AND S_CEFSTS<REQ_FILI_DONE> 
| AND S_CEFSTS<WRITE> AND S_CEFSTS<OREAD> 

' + > NDAX read data error on OREAD for write from Mbox after 

write data merged with fill data in cache (Section 15 . 7 . 1 

I S_CEFSTS<UNEXPECTED_FILL> 

+ > Unexpected NDAL fill received. 

I (Section 15.7.1.3.1) 

I otherwi se 

+ — ■ > Not a hard error interrupt cause (see soft error interruj 

events) 

S_CEFSTS<LOST_ERR> 

> Lost Bcache fill error 

(Section 15.7.1.3.2) 



Figure 15-8 Cont'd on next page 
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Cause Parse Tree for Hard Error Interrupts 



I S_NESTS<NOACK> 
+ + {select one) 

I I 

I | S_NEOCKD<CMD>=WRITE 

I + > no-ACK on WRITE command or data cycle 

I | (Section 15.7.1.4) 

| | S_NEOCMD<CMD>-WDISOWN 

I + > no-ACK on WDISOWN command or data cycle 

I | (Section 15.7.1.4) 

I | otherwise 

I + > Not a hard error interrupt cause (see soft error interrupt 

I events) 
I S_NESTS<LOST_OERR> 

+ > Lost no-ACK error 

I (Section 15.7.1.5) 

I (status consistent with hard error interrupt », 
I in system environment error registers) 

+ > Hard error interrupt from system environment 

I (Section 15.7.1.6) 

I otherwi se 

+ > Inconsistent status (Section 15 . 7 . 1 . 7) 



Notation: 

(select one) 



(select all) 

(select all, at least one) 



otherwise 

none of the above 



- Exactly one case must be true. If zero or more than one is 
true, the status is inconsistent. 

- More than one case may be true. 

- All the cases are possible causes of a hard error interrupt. 

More than one may be true. At least one must be true or the status 
is inconsistent. A case is not considered true if it evaluates to 
"Not a hard error interrupt cause". 

- fall-through case for (select one) if no other case is true. 

- fall-through case for (select all) or (select all, at least one) 
if no other case is true. 



15.7.1.1 Uncorrectable Data Errors and Addressing Errors During Write or Write-Unlock 
Processing 

Description: In processing a write or write-unlock, the Cbox detected an addressing error or 
an uncorrectable ECC error on the data read from the Bcache data RAMs. The write data has 
already been merged with the corrupted Bcache data and the write of the merged ("bad") data 
occurred. Data from the write is lost. 

There are two types of uncorrectable Bcache data RAM errors: addressing errors and 
uncorrectable ECC errors. Both are detected through the ECC check logic. Uncorrectable ECC 
errors indicate that two or more bits of the stored data quadword have changed and the error 
correcting code can not correct the data. A multiple-bit data error can appear to be addressing 
error, though it is extremely unlikely. A single-bit error combined with an addressing error 
appears as an uncorrectable error. 

Addressing errors indicate that the location read from the data RAM was probably written using a 
different address than the one used to read it out. The actual hardware failure could have occurred 
in the previous data RAM write or the current read. Addressing errors are more serious than 
uncorrectable ECC errors since they indicate the integrity of the entire Bcache is questionable. 
Also, there is less than a 100% chance that a given addressing error will result in recognition 
of an addressing error. This is because addressing errors are recognized by encoding the parity 
of the address with the data and checking it on read back. All single-bit addressing errors are 
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detectable. Note that addressing errors on writes are never detected if that data is never read 
out again. 

The Cbox inverts three of the check bits being written back into the data RAMs to ensure that 
if the data is read again an uncorrectable error will be detected. If a subsequent read occurs, 
SJBCEDSTS<LOST_ERR> should be set, and the instruction which issued the read will machine 
check. However this mechanism is not fully reliable at ensuring that a subsequent read will 
detect the error (see Section 15.11.1, Note On lagged-Bad Data Mechanisms). 

For either case, the physical address is determined from the contents of S_BCEDIDX using the 
procedure in Section 15.3.3.1.2.4. (If the physical address is found to be in 10 space, it is an 
inconsistent status. See Section 15.7.1.7.) S_BCEDECC contains the syndrome calculated by the 
ECC logic. The Bcache is in ETM. 

If the block's tag is found to contain an ECC error, then the address can not be determined. 

It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> 
are set. If they are, it is an inconsistent status (see Section 15.7.1.7). 

Recovery procedures (addressing error): Clear BCEDSTS<BAD_ADDR, LOCK>. 

Recovery procedures (uncorrectable ECC error): Clear BCEDSTS<UNCORR, LOCK>. 

Recovery procedures (both cases): The data in this block is lost. Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). Flushing the Bcache should cause a writeback 
error (in which BADWDATA will be sent on the NDAL), so BCEDSTS and NESTS should be 
cleared beforehand. Then use the system specific procedure to clear the tagged-bad state from 
this block in memory. 

It is possible that no writeback error will occur, or that it will happen at the wrong address. This 
would occur if an error in the data RAMs caused the data to appear as correctable or without 
error even though it was written with three ECC bits inverted. Also, this could occur if the data 
was written to a different location than intended (addressing error). If this happens, then the 
block in memory will incorrectly appear to be good data. 

NOTE 

When clearing the tagged-bad data state of memory, software must first ensure that no 
more accesses to the block can occur. Otherwise there is the danger that some process 
on some other processor or a DMA IO device will see incorrect data and not detect an 
error. 

Restart condition (addressing error): Addressing errors occur on data RAM reads and writes. 
Because the Cbox writes "bad" data back into the location, there is no way to distinguish transient 
read errors from transient write errors. Therefore, the worst case has to be assumed: some 
previous write was written to the wrong place in the Bcache or the failing write has been written 
to the wrong location in the Bcache. In other words, not only is the block whose address is known 
corrupted, but another block is as well. No restart is possible. The integrity of the entire system 
is questionable. Crash the system. 

Restart condition (uncorrectable ECC error): If the address of the data is available and no 
unexpected writeback errors occurred during the Bcache flush, software must determine if the 
lost data is fatal to one process or the whole system and take the appropriate action. 

If the address of the data could not be determined or unexpected errors occurred during the 
Bcache flush, crash the system. 
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15.7.1.2 Lost Bcache Data RAM Hard Errors 

Description: Some number of unrecoverable Bcache data RAM errors occurred and were not 
latched because BCEDSTS already contained a report of an unrecoverable error. There is no 
guarantee this error could have caused a hard error interrupt, though it may be a cause. 

Lost Bcache data RAM errors may be caused by more than one operand prefetch to the same 
cache block. 

Bcache data RAM errors which cause hard error interrupt indicate that write data has been lost. 
Specifically, a read-modify- write operation for a write or write-unlock had an uncorrectable ECC 
error or an addressing error. The data was written back into the RAMs with three check bits 
inverted. 

The Bcache is in ETM. 

Pending interrupts: A soft error interrupt may be pending. 

Recovery procedures: Clear BCEDSTS<LOST_ERR>. Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). 

Restart condition: No restart is possible since the errors which were not recorded could 
potentially have caused lost write data and no indication of what data is lost exists (based on 
the fact that this error was reported by hard error interrupt). Also, the possibility exists that a 
subsequent read to any location which had this error could receive incorrect data with no error 
indication. Crash the system. 

NOTE 

The lost data should be marked bad through the Bcache tagged-bad scheme. But there 
is a significant probability of an error converting that tagged-bad location back to good 
data. This is because precisely the location which had the data error is being depended 
on to store a different value without an error. The Bcache tagged-bad scheme does 
not reliably preserve the bad data status of the location in the presence of errors (see 
Section 15.11.1, Note On Tagged-Bad Data Mechanisms). So the tagged-bad locations 
may appear good to a subsequent reader. This is why the system must be crashed. 



15.7.1.3 • Bcache Timeout or Read Data Error in Quadword OREAD Fill After Write Data Merged 

Description: A D-stream ownership read for a write or write-unlock timed out or terminated 
receiving an RDE fill response after the requested quadword was received. The error could be 
sue to an error in the system environment or to any previous error in the system environment or 
this CPU which leaves a cache block marked as owned in memory and not marked as owned in 
any cache in the system. 

The quadword physical address is in S_CEFADR. The address should not be in 10 space. If it 
is, it is an inconsistent status (see Section 15.7.1.7). The merged data is in the Bcache in the 
quadword indicated in S_CEFADR. The ownership and valid bits in the Bcache are not set. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.7.1.7). 

Recovery procedures: Clear CEFSTS<LOCK>. Clear CEFSTS<TIMEOUT> if the error is a 
timeout, and CEFSTS<RDE> if it is a read data error. Flush the Bcache. Clear CCTL<HW_ETM> 
(after flushing the Bcache). 
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Depending on the system environment, memory may have set its ownership bit for this block. 
This should be predictable for the given system environment because at least one quadword of 
data was received successfully. If the bit is set, then subsequent reads and writes to the same 
location may fail while the error is being handled. 

The data in memory should be unchanged. The quadword containing the merged data is in the 
Bcache. 

In general, the memory block can not be repaired. However, assuming the memory block is 
left owned, no writes to the block have timed out in memory, and the block is private to the 
interrupted job, it can be repaired by the following procedure. 

• Extract the addressed quadword from the Bcache (see Section 15.3.3.1.2.3). 

• Reset memory's ownership state (see Section 15.3.3.1.2.2.2) and write the extracted quadword 
to memory. 

NOTE 

Software must somehow ensure that no writes to this block are pending in the memory 
before beginning the repair. This can be done by waiting an amount of time equal to a 
memory subsystem write timeout time.) 

If memory's ownership bit is not set, the block can not be repaired. 

Restart condition: If memory state repair is successful, restart. Otherwise, software must 
determine if the lost data is fatal to one process or the whole system and take the appropriate 
action. 

15.7.1.3.1 Unexpected Fill Error 

Description: At least one fill was received when none for that transaction ID was expected by 
the NVAX CPU. This can only occur if a serious NDAL error has occurred. Reads previous to this 
event may have received incorrect data. 

If S_CEFSTS<RDE> is set, the unexpected nil was an RDE NDAL transaction. 
The Bcache is in ETM. S.CEFADR is UNPREDICATBLE. 

Recovery procedures: Clear CEFSTS<LOCK, UNEXPE CTED_FILL> . Flush the Bcache and 
clear CCTL<HW_ETM> (in that order). 

Restart condition: Data may have been corrupted in memory because of incorrect read data 
being processed. Crash the system. 

15.7.1.3.2 Lost Bcache Fill Error 

Description: Either at least one fill error occurred in an OREAD after write data was merged 
or an unexpected fill was received. The error was not latched because CEFSTS and associated 
registers already contained a report of an unrecoverable error. There is no guarantee this error 
could have caused a hard error interrupt, though it may be a cause. 

The Bcache may be in ETM. Read S_CCTL<HW_ETM> to find out. 

Pending interrupts: A soft error interrupt may be pending. 
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Recovery procedures: Clear CEFSTS<LOST_ERR>. If the Bcache is in ETM, flush it and clear 
CCTL<HW_ETM> (in that order). 

Restart condition: Data has been corrupted but the address is unknown. Crash the system. 

15.7.1.4 NDAL No-ACK During WRITE or WDISOWN 

Description: When the Cbox issues an NDAL WRITE or WDISOWN on the NDAL and it is 
not acknowledged, the Cbox requests a hard error interrupt. This could be because the external 
component(s) received bad NDAL parity or it could be due to a system-specific notification of 
"non-existent memory or 10 location". The transaction is not retried by hardware, so the data is 
lost. Typically, for writebacks, the Bcache location is overwritten soon after this error, so there is 
no way to recover the data from the Bcache. 

The Bcache is in ETM. S_NEOADR contains the physical address. S.NEOCMD contains the byte 
mask and NDAL command. 

Recovery procedures: Clear NESTS<NOACK>. Flush the Bcache. Clear CCTL<HW_ETM> 
(after flushing the Bcache). 

Retry condition: Software must determine if the lost data is fatal to one process or the whole 
system and take the appropriate action. 

15.7.1.5 Lost NDAL No-ACK Hard Errors 

Description: Some number of outgoing NDAL WRITE or WDISOWN commands were not 
acknowledged and were not latched because NESTS, NEOCMD, and NEOADR already contained 
a report of an NDAL output error. There is no guarantee this error could have caused the hard 
error interrupt, though it may be a cause. 

Pending interrupts: A soft error interrupt may be pending. 

Recovery procedures: Clear NESTS<LOST_NOACK>. 

Restart condition: No restart is possible since the errors which were not recorded could 
potentially have caused lost write data. No indication of what data is lost exists. Crash the 
system. 

15.7.1.6 System Environment Hard Error Interrupts 

Description: Errors which occur in the system environment and result in loss of data and 
which can not notify the NYAX CPU by returning RDE notify the CPU of the error by asserting 
H_ERR_L (e.g., write errors). Errors which can be signaled by RDE should not use hard error 
interrupt notification. Errors which are corrected automatically by hardware and do not result 
in loss of data should use soft error interrupt notification instead. 

NOTE 

It is very important that components in the system environment which assert 
HJERRJL have a CPU accessible register which unambiguously reports the H_ERR_L 
assertion. Otherwise, system specific error handling for the hard error interrupt would 
always crash the system (every time). 
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It is also strongly recommended that an address be stored where applicable. This may 
allow the operating system to kill one process or job instead of crashing the system in 
the event of that hard error. 

Recovery procedures: Clear the error status bits in the system registers and perform any 
necessary system dependent recovery procedure. 

Restart condition: Depends on the error. If the system environment reports the address of the 
lost data (where applicable) software may be able to kill just one process instead of crashing the 
system. 

15.7.1.7 Inconsistent Status in Hard Error interrupt Cause Analysis 

Description: A presumed impossible error report was found in the error registers. This could 
be due to a hardware failure or bug. 

Recovery procedures: No specific recovery action is called for. 

Restart condition: No retry is possible. The integrity of the entire system is questionable. 
Crash the system. 
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15.8 Soft Error Interrupts 



Soft error interrupts are requested to report errors which were detected, but did not affect 
instruction execution. This results in an interrupt at IPL 1A (hex) to be dispatched through 
SCB vector 64 (hex). 

The stack frame for a soft error interrupt is shown in Figure 15—9. 



Figure 15-9: Soft Error Interrupt Stack Frame 



31 30 25 28127 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + (. — + — + 1- — + 1. — + — + — + — + — + (. — + — + — + — + — + — + — + — + — + — + (. 

I PC I : (SP) 

+ + — + + — + + + + + + h H + h + (- + + + + + + — + + + + + + + + + +— + 

I PSL I 



15.8.1 Events Reported Via Soft Error Interrupts 

This section describes all the errors which can cause a soft error interrupt. A parse tree is given 
which shows how to determine the cause of a given soft error. After that, there is a description 
of each error. For each error, the recovery procedure is given. Where appropriate, the conditions 
for restart are given. See Section 15.3.3 and Section 15.3.4 for more on error recovery and error 
retry. 

Figure 15-10 is a parse tree which should be used to analyze the cause of a soft error interrupt. 
It is assumed that the state being analyzed is the saved state, as described in Section 15.3.1. 
Otherwise the state could change during the analysis procedure, leading to possibly incorrect 
conclusions. (See Section 15.3.2 for general information about error analysis.) 

Note that many errors which cause a soft error interrupt may also lead to a machine check 
exception. For this reason, a soft error interrupt with no apparent cause is not an inconsistent 
state unless the CPU has executed an instruction while IPL was lower than 1A (hex) since the 
most recent machine check exception. 

When a soft error interrupt is the only notification for any memory read error which could cause 
a machine check, the error didn't cause a machine check for one of the following reasons. 

• The error did not occur on the quadword the Ebox or Ibox requested (Pcache fill error). 

• The Ebox took an interrupt before accessing an instruction or operand which was prefetched 
by the Ibox. (It could be this soft error interrupt.) 

• A prefetched instruction or operand belonged to an instruction following a mispredicted 
branch, so the Ebox never executed the instruction (and it was flushed from the pipeline 
when the branch mispredict was recognized). 

• The Ebox took an exception for a different reason before attempting to use an instruction 
execution dispatch or access an operand prefetched by the Ibox. (The pipeline was flushed 
because of the exception.) 
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Figure 15-10: Cause Parse Tree for Soft Error Interrupts 



SOFT ERROR INTERRUPT 

^ (select all, at least one) 



S_ICSR<LOCK> 
f- (select all, at least one) 



S ICSR<DPERR> 



S ICSR<TPERR> 



I none of the above 



S_PCSTS<LOCK> 
+ (select all, at least one) 



S PCSTS<DPERR> 



S PCSTS<RIGHT BANK> 



S PCSTS<LEFT BANK> 



otherwi se 



S_BCETS TS<LOCK> 
+ 1- (select one) 



S_BCETSTS<UNCORR> 
+ (select one) 



E BCETSTS<TS CMD>=DREAD 



S BCETSTS<TS CMD>-IREAD 



S BCETSTS<TS CMD>«OREAD 



S BCETSTS<TS CMD>-R I NVAX 



S BCETSTS<TS CMD>-0 INVAL 



S BCETSTS<TS CMD>-IPR DEALLOCATE 



otherwise 



-> VIC (virtual instruction cache) data parity error 
(Section 15.8.1.1) 

-> VIC tag parity error (Section 15.8.1.1) 



Inconsistent status (no ICSR error bits set) 
(Section 15.8.1.22) 



-> Pcache data parity error (Section 15.8.1.2) 



Pcache tag parity error in right bank 
(Section 15.8.1.2") 

Pcache tag parity error in left bank 
(Section 15. B'.l. 2) 

Inconsistent status (no PCSTS error bits set) 
(Section 15.8.1.22) 



I S BCETSTS-CTS CMD>»WUNLOCK 



Bcache tag store uncorrectable ECC error on D-stream rea 
(Section 15.8.1.3) 

Bcache tag" store uncorrectable ECC error on I-stream rea 
(Section 15.8.1.3) 

Bcache tag store uncorrectable ECC error on write or 
read-lock (Section 15.8.1.3) 

Bcache tag store uncorrectable ECC error on write-unlock 
(done only in ETM) (Section 15.8.1.3) 

Bcache tag store uncorrectable ECC error on writeback 
request type of NDAL operation (Section 15.8.1.3) 

Bcache tag store uncorrectable ECC error on 
writeback-and- invalidate type of NDAL operation (Section 

Bcache tag store uncorrectable ECC error on software 
forced deallocate (Section 15. 8.1.3) 

Inconsistent status (invalid command) 
(Section 15.8.1.22) 
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S BCETSTS<BAD ADDR> 



(select one) 




S_BCETSTS<TS_ 


_CMD>«=DREAD 


E_BCETSTS<TS_ 


_CMD>-IREAD 


S_BCETSTS<TS_ 


_CMD>— OREAD 


S_BCETSTS<TS_ 


_CMD>-WUNLOCK 


S_BCETSTS<TS_ 


_CMD>«R_INVAL 


£_BCETSTS<TS_ 


_CMD>«0_INVAL 


S_BCETSTS<TS_ 


_CMD >= IPR_DEALLOCATE 


otherwise 



otherwise 



S BCETSTS<LOST ERR> 



> Bcache tag store addressing error on D-stream read 
(Section 15.8.1.3) 

> Bcache tag store addressing error on I-stream read 
(Section 15.8.1.3) 

> Bcache tag store addressing error on write or 
read-lock (Section 15.8.1.3) 

> Bcache tag store addressing error on write-unlock 
(done only in ETM) (Section 15 .8 . 1 .3) 

> Bcache tag store addressing error on writeback 
request type of NDAL operation (Section 15.8.1.3) 

> Bcache tag store addressing error on 
writeback-and-invalidate type of NDAL operation (Section 15. 8.1.3) 

> Bcache tag store addressing error on software 
forced deallocate (Section 15. 8 . 1 . 3) 

> Inconsistent status (invalid command) 
(Section 15.8.1.22) 

-> Inconsistent status (no BCETSTS error bits set) 
(Section 15.8.1.22) 

-> Lost unrecoverable Bcache tag store error 
(Section 15.8.1.4) 
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I S_BC£TSTS<CORR> 
+ + (select one) 



S BCETSTS<LOCK> 



I otherwise 

n + (select one) 



S BCETSTS<TS CMD>-DREAD 



-> Xost Bcache tag store correctable error 
(Section 15.8.1.6) 



S_BCETSTS<TS_ 


_CMD>— IREAD 


S_BCETSTS<TS_ 


_CMD >— OREAD 


S_BC£TSTS<TS_ 


_CMD>-WUNLOCK 


S_BCETSTS<TS_ 


_CMD>-R_INVAL 


£_BCETSTS<TS_ 


_CMD>«0_INVAI 


S_BCETSTS<TS_ 


_CMD>-IPR_DEALLOCATE 


otherwise 



> Bcache tag store correctable ECC error on D-stream read 
(Section 15.8.1.5) 

> Bcache tag store correctable ECC error on 1-stream read 
(Section 15.8.1.5) 

> Bcache tag store correctable ECC error on write or 
read-lock (Section 15 . 8 . 1 . 5) 

> Bcache tag store correctable ECC error on write-unlock 
(done only in ETM) (Section 15.8.1.5) 

> Bcache tag store correctable ECC error on writeback 
request type of NDAL operation (Section 15 .8 . 1 . 5) 

> Bcache tag store correctable ECC error on 
writeback-and-invalidate type of NDAL operation (Section 

> Bcache tag store correctable ECC error on software 
forced deallocate (Section 15 . 8 . 1. 5) 

> Inconsistent status (invalid command) 
(Section 15.8.1.22) 
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I S_BCEDSTS<CORR> 
+ + (select one) 



£ BCEDSTS<LOCK> 



I otherwise 

4 h (select one) 



S BCEDSTS<DR CMD>«= DREAD 



| S BCEDSTS<DR CMD>-IREAD 



S BCEDSTS<DR CMD>«WRIT£BACK 



I S_BCEDSTS<DR_CMD>=RMW 
+ 



otherwise 



S_BCEDSTS<LOCK> AND 
NOT S_PCSTS<PTE_ER> 
+ (select one) 

I 

| E_BCEDSTS<DNCORR> 
4 v (select one) 



Lost Bcache data RAM correctable error 
(Section 15.8.1.8) 



Bcache data RAM correctable error on D-stream read 
(Section 15.8. 1.7) 

Bcache data RAM correctable error on I-stream read 
(Section 15.8.1.7) 

Bcache data RAM correctable error on writeback 
(Section 15.8.1.7) 

Bcache data RAM correctable error on read-modif y-write 
for write or write-unlock (Section 15 . 8 .1 . 7) 

Inconsistent status (invalid command) 
(Section 15.8.1.22) 



S_BCEDSrS<DR_CMD>«=DREAD 

> Bcache data RAM uncorrectable ECC error on D-stream read 

(or Pcache fill for read-lock) (Section 15 . 8 .1 . 9) 

S_BCEDSTS<DR_CMD>-IREAD 

> Bcache data RAM uncorrectable ECC error on I-stream read 

(Section 15.8.1. 9) 

S_BCEDSTS<DR_CMD>-WRI TEBACK 

> Bcache data RAM uncorrectable ECC error on writeback 

(Section 15.8.1.10) 

otherwise 

> inconsistent status (all other cases cause hard error 

interrupt) (Section 15.8.1.22) 
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I S_BCEDSTS<BAD_ADDR> 
+ h (select one) 



S BCEDSTS<DR CMD>— DREAD 



I S BCEDSTS<DR CMD>«IREAD 



S BCEDSTS<DR CMD>- WRITEBACK 



otherwise 



otherwise 



S_BCEDSTS<LOST_ERR> AND 
NOT S_PCSTS<PTE_ER> 

I 

| S NESTS<BADWDATA> OR S NESTS<LOST OERR> 



otherwise 



S_CEFSTS<IOCK> AND 
NOT S_PCSTS<PTE i _ER> 
h (select one) 



S_CEFSTS<TIMEODT> 
+ (select one) 



E CEFSTS<T0 MB0X> 



otherwise 



-> Bcache data RAM addressing error on D-stream read 
(or Pcache fill for read-lock) (Section 15 . 8 . 1 . 9) 

-> Bcache data RAM addressing error on I-stream read 
(Section 15.8.1.9) 

-> Bcache data RAM addressing error on writeback 
(Section 15.8.1.10) 

-> Inconsistent status (all other cases cause hard error 
interrupt) (Section 15 .8 .1 .22) 

-> Inconsistent Status (no error bits set in BCEDSTS) 
(Section 15.8.1.22) 



-> Lost unrecoverable Bcache data RAM error with possible 
lost writeback error (Section 15.8.1.11) 

-> Lost unrecoverable Bcache data RAM error 
(Section 15.8.1.12) 



S_CEFSTS<OREAD > 
+ (select one) 

I 

I S_CEFSTS<WRITE> AND 
I NOT S_CEFSTS<TO_MBOX> 
+ y (select one) 

I I 

I | S CEFSTS<REQ FILL DONE> 



otherwise 



-> Inconsistent status (should cause hard error interrupt) 
(Section 15.8.1.22) 

-> D-stream NDAL ownership read for Mb ox write timeout 

error before write data merged with fill data (Section 15 

-> D-stream NDAL ownership read timeout error 

(modify operand or read-lock) (Section 15. 8.1.13) 

-> Inconsistent status (either WRITE or TO_MBOX, but not bo 1 
should be set) (Section 15.8.1.22) 
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otherwise 
+ (select one) 

I 

I S CEFSTS<IKEAD> 



I 

I S CEFSTS<TO. MBOX> 



I 

I otherwise 



S_CEFSTS<RDE> 
K (select one) 



S_CEFSTS<OR£AD> 
+ + (select one) 



I 

I S_CEFSTS<WRITE> 

I AND NOT S_CEFSTS<T0_MB0X> 

+ + (select one) 

I I 

I | S CEFSTS<REQ FILL DONE> 



otherwise 



I S CEFSTS<T0 MB0X> 



I otherwise 



otherwise 
+ + (select one) 



S CEFSTS<IREAD> 



I 

I S CEFSTS-CTO MBOX> 



I 

I otherwise 



otherwi se 



£_CEFSTS<LOST_ERR> AND 
NOT S PCSTS<PTE ER> 



-> I-stream NDAL read timeout error (Section 15.8.1.13) 



D-stream NDAL read timeout error (read only operand) 
(Section 15.8.1.13) 

Inconsistent status (TO_MBOX should be set) 
(Section 15.8.1.22) 



Inconsistent status (should cause hard error interrupt) 
(Section 15.8.1.22) 

D-stream NDAL ownership read for Mbox write read data 

error before write data merged with fill data (Section 15 . 8 . 1 . 14) 

D-stream NDAL ownership read read data error 
(modify operand or read-lock.) (Section 15 . 8 . 1 . 14) 

Inconsistent status (either WRITE or TO_MBOX, but not both, 
should be set) (Section 15.8 .1.22) 



I-stream NDAL read read data error 
(Section 15.8.1.14) 

D-stream NDAL read read data error (read only operand) 
(Section 15.8.1.14) 

Inconsistent status (T0_MB0X should be set) 
(Section 15.8.1.22) 

Inconsistent status (either CEFSTS<RDE> or CEFSTS<TIMEOUT> 
should be set or, if CEFSTS<UNEXPECTED_FILL> is set, it 
should cause a hard error interrupt) (Section 15.8.1.22) 



Lost Bcache fill error 
(Section 15.8.1.15) 
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S_NESTS<NOACK> AND 
NOT S_P CS TS <P TE_ER> 
+ (select one) 

I 

I S_NEOCMD<CMD>-IREAD 

+ > Unacknowledged I-stream NDAL read (Section 15.8.1.16) 

I 

I S_NEOCMD<CMD>«DREAD 

+ > Unacknowledged D- stream NDAL read (read only operand) 

I (Section 15.8.1.16) 

I S_NEOCMD<CMD>-OREAD 

+ > Unacknowledged D- stream NDAL read (modify operand or re; 

I (Section 15.8.1.16) 

| S_NEOCMD<CMD>-WRITE or WD I SOWN 

+ > Inconsistent status (should cause hard error interrupt) 

| (Section 15.8.1.22) 

I otherwi se 

+ > Inconsistent status (invalid command in NEOCMD<CMD>) 

(Section 15.8.1.22) 

S_NESTS<LOST_OERR> AND 
NOT S_PCSTS<PTE_ER> 

> Lost NDAL output error (Section 15.8.1.17) 

S_BCEDSTS<LOCK> AND 
S_PCSTS<PTE_ER> 1 
h (select one) 

I 

I S_BCEDSTS<UNCORR> 

+ + (select one) • 

I I 

I | S_BCEDSTS<DR_CMD>-DREAD 

I + > Bcache data RAM uncorrectable ECC error on PTE read 

I | (Section 15.8.1.18.1) 

I | S_BCEDSTS<DR_CMD>-IREAD 

I + h (select one) 

I I I 

| | | S_BCEDSTS<LOST_ERR> 

I | + > Multiple errors in context of PTE read error 

I | | (Section 15.8.1.18.5) 

I | | otherwise 

I | + > Bcache data RAM uncorrectable ECC error on I-stream reac 

1 | (Section 15.8.1.9) 
v v 

2 3 
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At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes 
indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be 
treated separately. 
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Figure 15-10 (Cont): Cause Parse Tree for Soft Error Interrupts 



I S_BCEDSTS<DR_CMD>— WRITEBACK 
+ + (select one) 



I S_BCEDSTS<IOST_ERR> 
+ 



otherwise 



otherwise 



S_BCEDSTS<BAD_ADDR> 
+ (select one) 



S BCEDSTS<DR CMD>-DREAD 



I S_BCEDSTS<DR_CMD >= I READ 
+ + (select one) 



S BCEDSTS<LOST ERR> 



otherwise 



I S_BCEDSTS<DR_CMD>= WRITEBACK 

+ v (select one) 

I I 

I | £_BCEDSTS<10ST_ERR> 

| + 

I I 

I | otherwise 



I otherwise 



otherwise 



-> Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

-> Bcache data RAM uncorrectable ECC error on writeback 
(Section 15.8.1.10) 

-> Inconsistent status (all other cases cause hard error 
interrupt) (Section 15.8 .1.22) 



-> Bcache data RAM addressing error on PTE read 
(Section 15.8.1.18.1) 



-> Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

-> Bcache data RAM addressing error on I-stream read 
(Section 15.8.1.9) 



-> Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

-> Bcache data RAM addressing error on writeback 
(Section 15.8.1.10) 

-> Inconsistent status (all other cases cause hard error 
interrupt) (Section 15.8.1.22) 

-> Inconsistent Status (no error bits set in BCEDSTS) 
(Section 15.8.1.22) 
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1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes 
indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be 
treated separately. 
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S_CEFSTS<LOCK> AND 
S_PCSTS<PTE_ER> 1 
-■ 1- (select one) 



S_CEFSTS<TIMEOUT> 
+ (select one) 



S_CEFSTS<OREAD> 
k (select one) 

I 

I S_CEFSTS<WRITE> AND 
I NOT S_CEFSTS<TO_MBOX> 
+ + (select one) 

I I 

I | E CEFSTS<REQ FILL DONE> 



I otherwise 

■i + (select one) 



S CEFSTS<LOST ERR> 



otherwi se 



S_CEFSTS<T0_MB0X> 
— + (select one) 



£ CEFSTS<LOST EKR> 



otherwise 



otherwise 



otherwise 
+ + (select one) 



I S_CEFSTS<IREAD> 
+ + (select one) 



S CEFSTS<LOST ERR> 



otherwise 



S CEFSTS-CTO MBOX> 



I otherwise 
+ 



Inconsistent status (should cause hard error interrupt) 
(Section 15.8.1.22) 



Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

D-stream NDAL ownership read for Mbox write timeout 
error before write data merged with fill data (Section 1 



Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

D-stream NDAL ownership read timeout error 
(modify operand or read-lock) (Section 15 . 8 . 1 . 13 ) 

Inconsistent status (either WRITE or TO_MBOX, but not b 
should be set) (Section 15.8.1 .22) 



-> Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

-> I-stream NDAL read timeout error (Section 15.8.1.13) 



D-stream NDAL read timeout error (PTE read) 
(Section 15.8.1.18.2) 

Inconsistent status (TO_MBOX should be set) 
(Section 15.8.1.22) 



Figure 15-10 Cont'd on next page 



At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes 
indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be 
treated separately. 



1 5-66 Error Handling 
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Note that the initial packets of ISR data contain data from before the load event from the last 
bit on the chain. After one or two samples, this data is all valid sampled data. The bits from the 
scan chain are serial-to-parallel converted as shown in Table 19—3. Note that for ISR1, 9 bits are 
always visible. Every third NVAX cycle, they shift up by three bit positions. 

Table 19-3: Serial to Parallel Conversion of Scan Data 



PP_DATA_H Bit Bit from Scan Chain 



ISR1 


PP_DAIA<5> 


Most recently received bit 






PP_DATA_H<4> 


Second most recently received bit 






PP_DATA_H<3> 


Third most recently received bit 






PPJDATA_H<8:6> 


Last PP_DATA_H<5:3> (from 3 NVAX cycles 


ago) 




PP_DATA_H<11:9> 


Last PP_DATA_H<8:6> (from 3 NVAX cycles 


ago) 


ISR2 


PP_DATA_H<2> 


Most recently received bit 






PP_DATA_H<1> 


Second most recently received bit 






PP_DATA_H<0> 


Least recently received bit 
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Observe MAB 

For full speed MAB observation, an internal clock is provided which will allow synchronous 
capture by a DAS in any debug environment. Figure 19-3 shows the the self-relative timing 
during Observe MAB mode. 

Figure 19-3: Self Relative Timing in Observe MAB Mode 



1 NVAX cycle 
^ 

' /777^\' ' 2zZ7^sVs' ' /77ASSi L_ 



Force MAB 

During Force MAB mode an internal 11 bit counter forces address on the microaddress bus. The 
counter is initialized internally by the Ebox. It gets incremented each time FORCE MAB mode 
is entered, thus allowing it to go through all control store addresses. Refer to the testability 
sections of Micro-Sequencer chapter for further details of Force MAB operation. 

Observe Box Signals 

The timing for observing internal signals from boxes follows the basic pattern as that for observing 
MAB. Note that PPJDATAJB<11> may be used for observing box-specific signal. Details of the 
signals observed may be found in the testability section of each box chapter. 

19.5 Test Pads 

This port consists of strategic internal nodes brought out to top level of metal in the form of 
3x3 micron test pads. These pads will be accessed by probes during chip debug and wafer probe 
manufacturing tests. The access may primarily provide observability of these nodes, however, con- 
trollability may also be provided where appropriate. See the testability sections in box chapters 
for the list of nodes brought out on the top metal layer. 

19.6 System Port 

This is simply the normal system I/O of the chip. It is identified as a test access port because of 
two reason: 

* It is used to provide the read/write access to testability features via the VAX architecture's 
MFPR and MTPR instructions. 



P?__DATA<10: 0> 

?? DATA<11> 
(NVAX PHI 2) 
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• It provides the natural resource for testing the chip via the macro-code based tests. 
See the individual box chapters for the list of specific architectural features provided. 

1 9.7 Serial P-cache Port 

Instruction stream data may be serially loaded into the P-cache by supplying data on the TEST_ 
DATA_H pin and strobing it with the TEST_STROBE_H pin. Chip microcode collects the bit- 
serial data, packs it into longwords, and writes the longwords into the P-cache. After loading the 
P-cache, the microcode passes control to the first MACRO instruction in P-cache. 

The serial load follows this flow: 

• TEST_STROBE_H is de-asserted while ASYNCH_RESET_L is asserted. TEST_STROBE_H 
is normally pulled up through on-chip resistors. 

• When ASYNCH_RESET_L is de-asserted, the on-chip power-up microcode enters the special 
burn-in flow. 

• When MCHK_H is asserted, TEST_STROBE_H should be de-asserted. The chip is now ready 
to receive serial data input. 

• The first bit of instruction stream data should be placed on TEST_DATA_H, Then TEST_ 
STROBEJ3 should be asserted. 

• TEST_STROBE_H should then be de-asserted. TESTJDATA_H can change on the same edge 
as the TEST_STROBE de-assertion. 

• TEST_STROBE_H may transition at a maximum rate of 1/10 the internal chip clock fre- 
quency. There is no minimum rate. 

• 32K bits of instruction stream data must be loaded into cache. At this time, MCHK_H will be 
de-asserted, signifying the cache load is complete. The chip then jumps to the first location 
in P-cache, attempting to execute an instruction at that location. 

It is difficult to achieve high test coverage in the the bum-in and life-test environments due to 
limited test pattern bandwidth and the difficulty in synchronizing test equipment to the NVAX 
chip. Using this serial port, burn-in and life-test programs can load the real "test program" into 
P-cache, where the chip can perform a self-test. 

This scheme minimizes test pattern bandwidth, allows for asynchronous transmission of the serial 
data, provides a means to stimulate multiple chips under test which are running asynchronously, 
and supplies a means to achieve high test coverage. 

19.8 IEEE 1149.1 (JTAG) Serial Test Port 

The Serial lest Port is a 4-pin test access interface based on IEEE 1149.1 standard. (See [2], [3].) 
In NVAX it is used for accessing and controlling the boundary scan register. The port supports 
EXTEST, SAMPLE and BYPASS instructions. 
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Figure 19-4: Serial Port Timing 




The block diagram of the port logic together with the boundary scan register is shown in 
Figure 19-5. The port logic shown represents all the logic used in the definition of Common 
lest Interface (see [2]). It consists of the four-wire Test Access Port (TAP), a TAP controller, an 
instruction register (IR) and a bypass register (BPR). 

The four pins in test access port are TDI_H, TDO_H, TMS.H, and TCKJH. These pins conform 
to all requirements of the standard. The port also uses PP_CMD_H< 0 > pin as pseudo-TRSTJL 
pin. When asserted low, this pin resets the JTAG test logic. See Section 19.8.5 for more details. 

The TAP Controller is a state machine which interprets IEEE 1149.1 protocols received on IMS 
line and generates appropriate clocks and control signals for the testability features under its 
jurisdiction. 
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IEEE 1149.1 Serial Port (the Basic CTI) 
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The Instruction Register resides on a scan path. Its contents are interpreted as test instructions 
and are used to select the testability modes and features. 

The Bypass Register is a one bit shift register which provides a single-bit serial connection through 
the port (chip) when no other test path is selected. 



19.8.1 TAP Controller State Machine 

The TAP Controller is a synchronous finite-state state machine that interprets IEEE 1149.1 
protocols received on TMS line. The state transitions in the controller are caused by the TMS 
signal on the rising edge of TCK. In each state, the controller generates appropriate clocks and 
control signals that control the operation of the testability features. Appropriate actions of the 
testability features are initiated on the rising edge of TCK following the entry into a state. 

Hie TAP controller states provide the four basic actions required for testing: transportation of 
test data (Shift), stimulus application (Update), test execution (Run-Test), and response capture 
(Capture). Test data are transported generally in the beginning and at the end of a test. 

The state diagram for the TAP controller is shown in Figure 19-6. The TAP controller causes 
appropriate actions to occur only in the testability features selected by the current instruction in 
the instruction register. All other testability features maintain status quo. Status quo means that 
the registers either retain their previous state or continue to operate in their previously selected 
mode. 
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A Scan Sequence begins with entry into the Capture State and end with the exit from the Update 
State. The Scan Sequence entered from Select-DR-Scan controls the instruction register, and 
the one entered from Select-DR-Scan controls the testability feature selected by the instruction 
register. The actions caused by the states in the two scan sequences are identical. The following 
is the brief description of each state. 



Figure 19-6: TAP Controller State Machine 



Values Shown are TMS 




• Test-Logi c-Reset : This state disables the test logic. The chip performs normal system 
operation. Testability features are either inactive or are performing normal system operation. 
The TAP controller is forced into this state at power-up and it continues to remain in this 
state as long as TMS is held high. 

* Run-Test/Idle: This is a combined controller state between scan operations when the test 
logic is either idle or a particular test is running. 

For example, upon entry into this state, an internal test (such as self-test or macrocode 
test involving data reducers etc) selected by the current instruction is executed. All other 
testability features (not involved in the current test) maintain status quo. 
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• Select-DR-Scan: This is a temporary controller state in which all test registers (instruction 
register as well as testability features) maintain status quo. If IMS is held low when the 
controller is in this state, then a scan sequence for the selected test feature is initiated. 

• Select-IEt-Sean: This is a temporary controller state in which all test registers maintain 
status quo. If TMS is held low when the controller is in this state, then a scan sequence for 
the Instruction Register is initiated. 

• Capture: In this controller state, the chip data is parallel loaded into the selected test 
register (Instruction Register or testability feature). This is the state in which the observe 
action takes place. 

• Shift: In this state the selected test register shifts data one stage towards its serial output 
on each rising edge of TCK. 

• Exitl: This is a temporary controller state where all test registers maintain status quo. 

• Pause: This controller state allows shifting of the selected test register to be temporarily 
halted. All test registers maintain status quo. 

• Exit2: This is a temporary controller state. All test registers maintain status quo. 

• Update: The selected test register updates its outputs by transferring data from the shifter- 
stage into parallel output stage. This update action is initiated on the first falling edge of 
TCK upon entry into the state. All other registers maintain status quo. 

19.8.2 instruction Register 

The JTAG Instruction Register on NVAX CPU consists of 2 bits. The two bits are interpreted as 
per Table 19—4 to select and control the operation of boundary scan register. During Caoture-IR 
state, the shift register stage of IR is loaded with data '01\ This automatic load feature is useful 
for testing the integrity of the JTAG scan chain on module. 



Table 19-4: 


Instruction Register 




nt< IK) > 


Test Register Selected 


Test Instruction/ Operation 


00 


Boundary Scan Register 


extest. Also forces reset to internal chip logic. 


01 


Boundary Scan Register 


SAMPLE 


10 


Bypass Register 


BYPASS 


11 


Bypass Register 


bypass. Default 



A cell used in the instruction register is shown in Figure 19-7. The ir_cell operations are con- 
trolled by ra_CAPTURE_H, m_SHTFT_Cl, Ht_SHlFT_C2, ER_UPDATE_H and ER_RESET_L signals. These 
signals are described later. 



DIGITAL CONFIDENTIAL 



Testability Micro-Architecture 19-11 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Figure 19-7: JTAG Instruction Register Cell 
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19.8.3 Bypass Register 

The bypass register provides a one bit scan route though the NVAX chip during a scan-shift 
operations. It provides a means for effectively bypassing the NVAX CPU chip's test logic during 
testing at module and system levels. 

When the bypass register is selected, a CAPTURE-DR controller state loads a '0' in the bypass 
register. When the JTAG instruction selects the Bypass operation, Bypass register is selected for 
the scan operation. 

19.8.4 Control Dispatch Logic 

Dispatch logic generates signals to control operations of JTAG circuitry, including the the instruc- 
tion register and the driver on TDO_H pin. It decodes the current instruction in the IR and the 
current TAP controller state information and dispatches the control signals to the bypass and 
boundary scan registers. The control signals dispatched are described below.. 

Dispatch to Boundary Scan Register 

• BSR_EXTEST_H: Asserted high when the instruction selects EXTEST instruction. This allows 
boundary scan cells to drive data on output and I/O pins. BSR_EXTEST_H also forces an 
internal reset to chip logic. This makes chip's internal logic insensitive to test patterns used 
for interconenction test. 

• BSR_CAPTURE_H: The signal is asserted when TAP controller enters the CAPTURE-DR state 
and deasserted when the TAP Controller exits CAPTURE-DR state. The signal causes data 
to be observed into the boundary scan register, 

• BSR_SHIFT_Ci: Issues a pulse with the falling edge of TCKJ3 during CAPTURE-DR and 
SHIFT-DR states. 

• BSR_smFT_C2: Unconditionally issues a pulse with each rising edge of TCK_H. 

• BSR_UPDATE_H: Issues a pulse with the falling edge of TCK__H during UPDATE-DR state. 
This pulse loads new data into the parallel output latch in mdjbcells described later. 
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Dispatch to Bypass Register 

Dispatch to Bypass Register consists of BSR_CAPTURE_H, BSR_SHIFT_C1, and BSR_SHIFT_C2 sig- 
nals. Note that these are subset of signals dispatched to the boundary scan register. 

Dispatch to Instruction Register 

• nt_CAPTURE_H: The signal is asserted when TAP controller enters the CAPTURE-IR state 
and deasserted when the TAP Controller exits CAPTURE-IR state. The signal causes status 
data COD to be observed into IR, 

• m_SHIFT_Ci: Issues a pulse with the falling edge of TCK_H during CAPTURE-IR and SHIFT- 
IR state. 

• IR_SI±1FT_C2: Unconditionally issues a shift pulse with each rising edge of TCK_H. 

Note that the data shifts from the most significant bit to least significant bit. The least 
significant bit is at TDO_H. 

• nt_UPDATE_H: Issues a pulse with the falling edge of TCK_H during UPDATE-IR state. This 
pulse loads new instruction into the parallel output latches of IR. 

• IR_EESET_L: This signal initializes the instruction register's output latches. When asserted 
low. all IR output latches are set high to force BYPASS instruction. IR_RESET_L is asserted low 
when the TAP Controller enters the Test-Logic-Reset state. 

Dispatch to TDO Multiplexers and Driver 

Multiplexer control is dispatched by decoding the instruction register as per Table 19—4. ENABLE. 
TDO_H is generated as follows. 

• ENABLE_TDO„H: This signal is asserted high when the TAP controller is in SHIFT-IR or 
SBflFT-DR states. The signal enables TDOJ3 pin driver whenever a shift operation is in 
progress and keeps it disabled all other times. 

Figure 19-8 and Figure 19-9 show the timing diagram of the signals dispatched by the Control 
Dispatch Logic and the behavior of the Boundary Scan Register and the Instruction Register 
during the IR-Scan and DR-Scan sequences. 

Notice that the implementation must meet the standard's requirement that the changes on TDO_ 
H occur with falling edge of TCK_H signal. In NVAX CPU this requirement is met by including a 
taming latch at the TDOJH pin. The latch opens when the TCKJB is low and closes when TCK_H 
is high. 
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Instruction Register Scan (Example: Load EXTEST Instruction) 
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19.8.5 Initialization 

The TAP Controller and the Instruction Register's output latches are initialized by PP_CMD_ 
H<0>. When PP_CMD_H<0> pin is asserted low, the TAP controller is forced to enter the 
Test-Logic-Rest state and the IR is forced to BYPASS instruction. 

During Test-Logic-Reset state, all JTAG logic, including boundary scan register, is in inactive 
state. That is, the chip performs normal system functions. The boundary scan logic is set to a 
passive sample (observe) mode. TAP controller leaves this state only when a JTAG test operation 
is desired and appropriate sequence is sent on TMS_H and TCKLH pins. 

NOTE 

Note that PP_CMD_H< 0 > pin on NVAX CPU acts like a pseudo-TRST_L pin. Since 
this pin is internally pulled-up, a system designer must make provision to assert 
the pin low, at least during the power-up operation. This will keep all JTAG 
circuits inactive and allow system to wake up normally in system mode. 

19.9 Boundary Scan Registers 

The NVAX CPU chip's boundary scan register primarily facilitates interconnection test on module 
during module manufacturing and field service. Uses during other life cycle testing phases may 
also be possible. 

The boundary scan register is a single shift register formed by boundary scan cells placed at most 
of the chip's signal pins. The register is accessed via the JTAG port's TDI_H and TDO_H pins. 
Its operation is controlled by the control dispatch received from the JTAG Port. 

19.9.1 Boundary Scan Register Cells 

The NVAX chip uses four main types of boundary scan cells. 



in_bcell: Used on input-only pins. Figure 19-10 shows the block diagram. The bcell basically 
consists of 1-bit shift register. The cell supports Sample and Shift functions. The cell is used at 
input-only pins. 

out_bcell: Used on output-only pins. Figure 19-11 show the block diagram. Besides the shift 
register, the cell has an output multiplexer. The cell supports the following functions: Sample, 
Shift, Drive outputs. The cell is used at miscellaneous output-only pins. 

io_bcell: Used on bi-directional pins. Figure 19-12 show the block diagram. The cell is identical 
to the out_bcell cell except that it captures test data from the incoming data line. The cell supports 
Sample, Shift, Drive output functions. It is used at all I/O pins. 



19—16 Testability Micro-Architecture 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



md__bcell: Used on certain special pins and internal signals. For example, this cell is used on 
TS_WE_L, TS_OE_L, DR.WEJL, DR_OE_L pins and on internal driver enable signals for bi- 
directional busses. Figure 19-13 show the block diagram of an mdjbcell. The cell builds upon the 
outjbcell. It has a third output latch which holds data at output steady while a shift operation 
is in progress. The cell supports Sample, Shift, Drive output, and Hold output functions. 



Figure 19-10: in_bcell Boundary Scan Cell 
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Figure 19-11: out_bceli Boundary Scan Cell 
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Figure 19-12: io_bcell Boundary Scan Cell 
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Figure 19-13: md_bcell Boundary Scan Cell 
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NOTE 

Caution: In NVAX CPU chip, when Boundary Scan Register is shifting data in EXTEST 
mode (that is, when bsr_extest_h is asserted) the shifting of data is transferred to the 
pins and is visible to the other components connected to the pins. 

Since the back-up cache interface pins are connected to RAMs which do not have bound- 
ary scan on them, the protection is provided by extra logic in the bcells on R/W bits. 
This is explained later. 
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19.9.2 Boundary Scan Register Organization 

The boundary scan register on NVAX CPU chip is 243 bits long. Table 19-6 lists all the signal 
pins and the associated boundary scan register cell type. The pins are listed in the order of 
their connection from TDI_H pin to to TDO_H pin. Thus, cell on the internal signal on signal 
C_PAD_N%NDAL_OUT_DRV_H is closest to TDI_H pin and the cell on pin CHIP_ID_H<11> is 
farthest from the TDI__H pin. In an entry with more than one pin, the cell on the first pin is closer 
to the TDI_pin. On-chip fuses provide a means to program each die with a unique ID number 
which can be used to trace a packaged part back to the lot, wafer, and die location of origin for 
yield analysis. Although it is not part of the chip boundary, the twelve bit CJbLLPjDD_H<ll> is 
connected to boundary scan chain so that the ID can be easily accessed through the JTAG port. 



Table 19-5: Boundary Scan Register Organization 








BSR Cell 




Signal Name 


Count 


Pin type 


Type 


Remarks 


C_PAD_N%NDAL_OUT_DRV_H 


1 


Int signal 


md_bcell 


Int Signal 


NDAL_H< 32:63 > 


32 


I/O, tri, 4 pts 


io_bcell 




OSC_H, OSCJL 


2 


In 


none 




OSCJTESTH 


1 


In 


none 




OSC_TCl_H, OSC_TC2_H 


2 


In 


none 




PHI12_OUT_If, PHI23_OUT_H 


2 


Out, ID, 4R 


none 




PHI41_OUT_H, PHI34_OUT_H 


2 


,Out, ID, 4R 


none 




SYS_RESET_L 


1 


^Out 


md_bcell 




ASYNC_RESET_L 


1 


In, ID, 3R 


in_bcell 




DISABLE_OUT_L 


1 


In, ID, 3R 


injbcell 




TEST_STROBE_H 


1 


In, ptp 


in_V^ell 




TESTJDATA.H 


1 


In, ptp 


in_bcell 




IRQ_L< 0:3 > 


4 


In, Op dr, 3D, 1R 


in_bcell 




H_ERR_L, S.ERR.L 


2 


In, Op dr, 3D, 1R 


in_bcell 




INT_TIM_L 


1 


In, ptp 


in_bcell 




PWRFLJL, HALT.L 


2 


In, ptp 


in_bcell 




MACHINE_CHECK_H 


1 


Out, ptp 


out_bcell 




TEMP_H 


1 


Out 


none 




PP_CMD_H< 0:2 > 


3 


In, pull-up 


none 




PP_DATA_H< 0:11 > 


12 


Out 


none 




TS_TAG_H< 17:31 > 


15 


I/O, tri 7 pts 


iojbcell 




TS_ECC_H< 0:5 > 


6 


I/O, tri 7 pts 


io_bcell 




TS_OWNED_H, TS_VALID_H 


2 


I/O, tri, 7 pts 


io_bcell 




C_PAD_T%EN_TS_DRVH 


1 


Int. signal 


md_bcell 


Int signal 


TS_INDEX_H< 5:20 > 


16 


Out, 6 pts 


out_bcell 
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Table 19-5 (Cont): Boundary Scan Register Organization 



Signal Name 


Count 


Pin type 


BSR Cell 

Type 


TS_OE_L, TS_WE_L 


2 


Out, 6 pts 




DR INDEX H< 3-20 > 


18 


Out 8 t>ts 


nut. VicaII 


DR OE L DR WP L 


2 


Out 8 r»tft 

W tXv, \J pub 


TT\n V\f*All 

XlXVi. Uwli 


C PAD D%EN DR DRV H 




TfitjaT^nsi 1 crier 

XXXvCX 1X4X1 Big 


TYlfl no All 
XXI. Vt__ 1/ CCX1 


DR DATA H< 0:23 > 

XV XV A/aXAa*. XX^v W*X#W 


24 


I/O tri 19 uts 

X/ W , l/X X, X fc/ fc/ uo 


"in Vm^aII 

XV UvwXX 


DR ECC H<- 0-7 


g 


I/O tri 19 T>t.s 


Av>_ XJl>^SXX 


DR DATA H«r 24-63-> 


40 


I/O tri 1 Q nts 


ia r\z*Ail 


CPU WE ONLY L 




XXX} W ux 


1TI fv*aI1 
XXl^UvCll 


ACK L 




T/O On fir 4. nta 


ia r*/*All 

XU Uygll 


CPU SUPRESS L 




Out nrr> 




CPU HOLD L 




Out *ntn 


out KpaII 


CPU REO L 


1 


Out nrn 


Anf r*f*Ail 

\J UXL> XJil>\5XX 


CPU GRANT L 

wX w M VJXtfM 1 X XV 


1 


In, ptp 


1 71 rv*A 11 

XXX m XJVPXX 


CMD_H< 0:3 > 


4 


I/O, tri, 4 pts 


io bcell 


ID_H< 0:2 > 

• 


3 


I/O, tri, 4 pts 


io bc6ll 


PARITY H<: 0-2 :> 

X XXX VX X X m XXV, \J »JU ^ 


3 


I/O tri 4 nts 


in Vw*aI1 






i/Vj wl. pub 


iu___uceix 


CHIP_ID_H<0:11> 


12 


Int signal 


injocell 


PHI12_IN_H, PHI23_IN_H 


2 


In, ID, 4R 


none 


PHI41_IN_H, PHI34_IN_H 


2 


In, ID, 4R 


none 


TMS_H 


1 


In, pull-up 


none 


TCK_H 


1 


In, pull-down 


none 


TD0_H 


1 


Out,tri, 2D 


none 


TDI_H 


1 


In, ptp, pull-up 


none 



Remarks 



Int Signal 



Int Signal 



Some of the boundary scan register cells in NVAX are grouped together to form sections. A section 
is simply a collection of pins that are identical in nature and have identical boundary scan cells 
on them. A section is generally controlled and operated identically during certain test modes. 
The pins in a section may also be logically related and may be located physically together. Some 
such sections are described below. 

BSR at TAG Store Interface 

The boundary scan register at TAG Store interface consists of 4 sections: WE/OE bits, a driver 
enable bit on C_PAD_T%ENJTS_DRV_H signal, 23 data bits (tag, ECC and own), and 16 address 
(index) bits. Figure 19—14 shows the block diagram. The boundary scan cell type used in each 
segment is listed in Table 19-5. (The figure does not show the actual order of connection.) 



19-20 Testability Micro-Architecture 



DIGITAL CONFIDENTIAL 



Figure 19-14: 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 
Boundary Scan Register at TAG Store Interface 




bs r_extest_h 
b s r_u pd ate_h 



b s r_s h if t_c 1 
bs r_captu re_h 





BSR 




Control 




Dispatch 




IEEE P1149 


1 Po rt 



The following are some specific requirements. 

WE/OE Bits: The WE/OE Bits use md_bcells with additional logic to allow proper operation of 
RAMs during interconnection testing. When bsr_extest_h is asserted, the test data is injected on 
pins is as follows: 

• TS_WEJL bit: Data injected is the logical OR of the value stored in the md_bcelTs output 
latch and the complement of bsr_update_h signal. 

• TS_OEJL bit: Data injected is the logical OR of the value stored in the md_bceH's output 
latch and the complement of bsr_capture_h signal. 

Idea is to assert these two signals appropriately in a non-overlapping manner and only when the 
boundary scan is not shifting the data. This enhancement allows the test operation to meet the 
timing constraints in accessing RAMs. (See reference [4].) It also protects RAM interface from 
the shifting data pattern. 

BSR at Data RAM Interface 

The BSR section at Data RAM interface also consists of 4 segments: WE/OE bits, a driver enable 
bit on C_PAD_D%EN_DR_DRV_H) signal, 72 Data bits, and 18 Index bits. The block diagram and 
operation of BSR at Data RAM interface are exactly same as the BSR at TAG Store interface. 

BSR at NDAL interface 

The BSR section at NDAL data interface has a driver enable bit on the internal signal C_PAD_ 
N<3lNDAL_OUT_DRV_H. It allows the drivers on bi-directional NDAL pins to be controlled by JTAG 
during testing. 
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19.10 Internal Scan Register and LFSR Reducer 

NVAX CPU chip has several internal nodes observable via internal scan registers. This observ- 
ability facilitates chip debug. Some internal scan register sections turned into LFSR Reducers to 
enhance fault coverage and reduce test vectors during chip manufacturing tests. 



19.10.1 Internal Scan Register Cells 

Figure 19-15 shows the block diagrams of two types of cells used in NVAX. ISR cell is used for 
Scan-only registers and ISL is used for implementing Scan-cum-LFSR registers. 

Figure 19-15: Cells for Internal Scan Registers 



pi 



pi 



load h 



D Q 

G 



D Q 
G 



SO 



SI 



PHI 2 



PHI 4 



If s r h 



5 



D Q 
G 



D Q 
G 



SO 



PHI 2 



PHI 4 



ISR Cell 

Cell for Scan-only Register 



ISL Cell 

Cell for Scan-cum-LFSR Register 



Figure 19—16 shows how an LFSR is constructed by using ISL cells and an ISR. The ISR cell used 
in the left-most bit position represents a dummy bit. The cell provides the multiplexer function 
required to enable feedback during LFSR operation. (Note that this cell can be replaced by an 
ordinary multiplexer. The feedback taps for the LFSRs are based on primitive characteristic 
polynomial. (The actual taps used will be documented in respective box chapters when LFSR 
size and other constraints are known.) 

Internal Scan register's operations are controlled by internal NVAX clocks and by two signals 
received from the parallel port as follows: 

— PHI_4_H and PHI_2_H are internal NVAX clocks. The PHI_4_H loads the master and PHX_2_H 
loads the slave. 
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Figure 19-16: 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
An ISR section turned into LFSR 



Pl<n> 



Pl<n-1> 



Pl<0> 




— When ISR_LOAD_H is asserted high the master latches in ISR and ISL cells capture/observe 
data from internal signals. When ISRJLOAD_H is asserted low the internal scan register 
shifts data. Note that the shift occurs independent of assertion on ISR_LFSR_H. ISR_LOAD_H 
is latched in phase PHI_3 before using it to control the ISRs. 

— When both ISR_LFSR_H and ISR_LOAD_H are asserted high, the internal scan register sections 
containing ISL cells operate as LFSRs to and compress data. ISR_LFSR_H is a;so latched in 
phase PHI_3 before using it to control the internal LFSRs. 

19.10.2 Internal Scan Register Organization 

The Internal Scan Registers are divided into 2 groups: ISR1 and ISR2. The ISR1 consists of the 
scan register on the control store. It is used for patching the control store as well as reading out 
the control store during testing. 

ISR2 consists of all the other internal scan registers. Specific nodes included on the internal scan 
registers are listed in individual box chapters under their testability sections. The individual box 
scan registers are chained together, and are shifted out in the following order: Ibox, Ebox, Mbox, 
Cbox. 

Both ISR1 and ISR2 operate at the internal clock rate. However, they are read out at the parallel 
port at NDAL clock rate. See Section 19.4.1 for details of ISR1 and ISR2 operation. 



19.11 Output Pin Tri-state Control 

NVAX CPU chip has a dedicated pin disable_out_l. When asserted low, the CPU chip tri-states 
output drivers on all output-only and bi-directional pins, except those listed below. When asserted, 
the pin also forces internally a reset to the NVAX CPU chip. 
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The only exceptions are the TDO_H pin and NDAL clock output pins which are not tristated by 
the disable_outJL pin. Not tristating clock output pins has been approved by the stage- 1 module 
test engineers. 

Leaving out the TDO_H pin allows the JTAG circuits to operate while chip tristate is in effect. 
This affords additional flexibility for the module manufacturing test. For example, during the 
interconnection test, the NVAX outputs may be allowed to drive only during the CAPTURE-DR 
state and kept in tristate in all other states. This can eliminates the effect of shifting patterns, 
as well as drastically reduces the duration of time for which the drivers may see an interconnect 
short fault. 

The single pin tristate function is used only during testing. 

Note that the drivers on bi-directional I/O pins are also tristated by internal Cbox logic during 
RESET and by the boundary scan register during the interconnection test (EXTEST mode).. The 
order of precedence is as follows: DISABLE_OUT_L, Boundary scan register, and the Cbox logic. 

19.12 Operating Speed of Test Logic 

The IEEE 1149.1 Port and the boundary scan register are designed to be operable in the range 
0 to 10 MHz at least. Internal scan registers operate at internal clock rate. A higher speed of 
10 MHz (instead of 5 MHz) has been set to make the boundary scan register usable during the 
wafer probe testing. 



NOTE 

The JTAG circuitry design must account for the fact that TCK_H will not be driven in 
the running system. 
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Table 19-6: Revision History 
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When 


Description of change 


Dilip 


Bhavsar 


06-Mar-1989 


Release for external review. 


Dilip 


Bhavsar 


18-Ju]-1989 
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Dilip 


Bhavsar 
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Dilip 


Bhavsar 
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Dilip 


Bhavsar 
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(3.2) bcell on SYS_REST pin changed to md_bcell. IR and speed spec 
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Dilip 


Bhavsar 


14-June-1990 


(3.3) Parallel port modes changed. JTAG Reset added. Box control- 
lability removed. 


Dilip 


Bhavsar 


03-July-1990 


(3.4) JTAG Reset finalized, bcell and other figures updated to reflect 
actual implementation. Timing on JTAG control signals changed to 
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John 


F. Brown 


03-Jul-1990 


Serial PCache Port details added. 


Dilip 


Bhavsar 


30-Jul-1990 


Reset actions by JTAG EXTEST instruction and DI3ABLE_OUT_ 
L pin added. Timing diagrams added. Parallel Port operation de- 
tails added. ISR/ISL clocking changed to PHI_4 (master) and Phi_2 
(slave). Final Edits for Rev 3.4. (See NITS 314, 330, 337, 351, 360) 


John 


F. Brown 


23-Aug-90 


Timing diagram for Serial P-cache port added 


Dilip 


Bhavsar 


28-Sep-90 


Rev 3.5. PPort timing changed (NITS # 385). More description for 
PPort operatdonAlso, the boundary scan order updated to reflect im- 
plementation. 


John 


F. Brown 


20-Feb-91 


Rev 3.6. Updates for spec release: PPort fields & scan chain order 
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Cause Parse Tree for Soft Error Interrupts 



I S_CEFSTS<RDE> 
h + (select one) 



S_CEFSTS<OREAD> 
+ (select one) 



| S_CEFSTS<WRITE> 

I AND NOT S_CEFSTS<TO_MBOX> 

+ + (select one) 

I I 



I S CEFSTS<REQ FILL D0NE> 



otherwise 
h (select one) 



S CEFSTS<LOST ERR> 



I otherwi se 
+ 



I S_CEFSTS<T0_MBOX> 
+ + (select one) 

I I 

I I S CEFSTS<LOST £KR> 



otherwise 



otherwise 



otherwise 
+ + 



Inconsistent status (should cause hard error interrupt) 
(Section 15.8.1.22) 



Multiple errors in context of PTE read error 
(Section 15.8.1.18.5) 

D-stream NDAX ownership read for Mbox write read data 

error before write data merged with fill data (Section 15 . 8. 1 . 



Multiple errors in context of PTE read error 
(Section 15.8.1. 18 .5) 

D-stream NDAX ownership read read data error 
(modify operand or read-lock) (Section 15 . 8. 1 . 14) 

Inconsistent status (either WRITE or T0_MB0X, but not both, 
should be set) (Section 15.8.1.22) 



| S_CEFSTS<IREAD> 
+ v (select one) 

I I 

| | S_CEFSTS<LOST_ERR> 

I + > Multiple errors in context of PTE read error 

I I (Section 15.8.1.18.5) 

I | otherwise 

I + > I-stream NDAL read read data error 

I (Section 15.8.1.14) 

| S_CEFSTS<TO_MBOX> 

+ > D-stream NDAL read read data error (PTE read) 

I (Section 15.8.1. 18.3) 

I otherwise 

+ > Inconsistent status (TO_MBOX should be set) 

(Section 15.8.1.22) 

otherwise 

> Inconsistent status (either CEFSTS<RDE> or CEFSTS<TIMEOUT> 

should be set or, if CEFSTS<UNEXPECTED_FILL> is set, it 
should cause a hard error interrupt) (Section 15.8 .1.22) 



Figure 15-10 Cont'd on next page 



At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes 
indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be 
treated separately. 
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i £_NESTS<NOACK> AND 

I S_PCSTS<PTE_ER>1 

H --+ (select one) 

I i 

I | S_NEOCMD<CMD >— IREAD 

I + + (select one) 

I I I 

I I I S_NESTS<LOST_OERR> 

I | h > Multiple errors in context of PTE read error 

I | | (Section 15i8. 1.18. 5) 

I | | otherwise 

I | h — > Unacknowledged I- stream NDAL read (Section 15.8.1.16) 

I I 

| | S_NEOCMD<CMD>-DREAD 

I + > Unacknowledged D- stream NDAL read (PTE read) 

I | (Section 15.8.1.18.4) 

| | £_NEO CMEK CMD >= OREAD 

I + + (select one) 

I I I 

I | | S_NESTS<LOST_OERR> 

I | + > Multiple errors in context of PTE read error 

I | | (Section 15.8.1.18.5) 

I 1 | otherwise 

I | h > Unacknowledged D-stream NDAL read (modify operand or re 

I | (Section 15.8.1.16) 

| | S_NEOCMEK CMD >•* WRITE or WD I SOWN 

[ + > Inconsistent status (should cause hard error interrupt) 

I | (Section 15.8.1.22) 

I | otherwise 

I + > Inconsistent status (invalid command in NEOCMD<CMD>) 

I (Section 15.8.1.22) 

I S_NESTS<PERR> 

+ + (select one) 

I I 

I I S_NESTS<INCON_PERR> 

| + > NDAL inconsistent parity error 

| | (Section 15.8.1.19) 

I | otherwise 

I + > NDAL parity error (Section 15. 8.1.19) 

I 

I S_NESTS<LOST_PERR> 

+ > Lost NDAL parity error or inconsistent parity error 

| (Section 15.8.1.20) 

I (status consistent with soft error interrupt 

I in system environment error registers) 

+ > Soft error interrupt from system environment 

| (Section 15.8.1.21) 

I none of the above 

+ -> Inconsistent status (possible machine check or hard err 

interrupt during soft error interrupt processing) 
(Section 15.8.1.22) 



Figure 15-10 Cont'd on next page 



1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes 
indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be 
treated separately. 
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Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 



Notation: 



otherwise 

none of the above 



(select one) 



(select all) 

(select all, at least one) 



Exactly one case must be true. If zero or more than one is 
true, the status is inconsistent. 
More than one case may be true. 

All the cases are possible causes of a soft error interrupt. 
More than one may be true. At least one must be true or the status 
is inconsistent. A case is not considered true if it evaluates to 
•Not a soft error interrupt cause". 

fall-through case for (select one) if no other case is true, 
fall-through case for (select all) or (select all, at least one) 
if no other case is true. 



15.8.1 .1 VIC Parity Errors 

Description: A parity error was detected in the VIC tag or data store in the Ibox. 
VIC Data Parity Errors: A parity error occurred in the data portion of the VIC. 
VIC Tag Parity Errors: A parity error occurred in the tag portion of the VIC. 
In all cases, the quadword virtual address of the error is in S_VMAR. 

Recovery procedures: To recover, disable and flush the VIC by re-writing all the tags (using 
the procedure in Section 15.3.3.1.1.1). Also, clear ICSR<LOCK>. 

15.8.1.2 Pcache Parity Errors 

Description: A parity error was detected in the Pcache. Either a tag parity error or a data 
parity error is reported, though tag parity errors in both the left and right banks may be reported 
simultaneously. The reference, whether it was a read or write, was passed to the Cbox as if the 
Pcache had missed. No data is lost. The Pcache is disabled because PCSTS<LOCK> is set. 

S_PCADR contains the physical address of operation incurring the error. The address should not 
be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). 

Recovery procedures: Clear PCSTS<LOCK>. Flush the Pcache and initialize the Pcache tag 
store (see Section 15.3.3.1.1.1.2). 

15.8.1.3 Bcache Tag Store Uncorrectable Errors 

Description: An uncorrectable ECC error or an addressing error resulted from reading the 
Bcache tag store. The Bcache is in ETM. The hexaword physical address of the transaction 
mcurring the error is in SJBCETIDX. (If the physical address is found to be in IO space, it is 
an inconsistent status. See Section 15.8.1.22.) SJBCETAG contains the actual tag data and 
check bits read during the failing access. Software may use the routine TAG_ECC_CHECK in 
Section 15.10 to check the tag data and determine the syndrome. The result of this check should 
give the result expected from S_BCETSTS<UNCORR,BAD_ADDR>. 

It should never be the case that both S_BCETSTS<BAD_ADDR> and S_BCETSTS<UNCORR> 
are set. If they are, it is an inconsistent status (see Section 15.8.1.22). 
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For any normal Mbox command (i.e., not BCFLUSH), this error leads to a fill of the block whose 
tag had the error. This is because the Cbox converts uncorrectable tag store errors into misses 
and sends the associated reference to memory. For reads, the reference sent out is a read or an 
ownership read, and when the data returns it is loaded in the Bcache. For writes, an ownership 
read is sent, and when the data returns the write is merged with it and it is loaded in the Bcache. 
When the fill finishes successfully, the tag is updated (overwriting the bad tag). If the fill times 
out, the tag is not overwritten. 

In some cases, this error leads to an NVAX CPU read timeout and/or a write timeout in memory. 
This occurs when the block was VALID-OWNED in the Bcache and is the same block that is being 
accessed by the failing operation. Errors resulting from these lost blocks are handled separately. 

Write-unlocks are a special case. No tag lookup is done for write-unlocks unless the Bcache is in 
ETM. If the Bcache is in ETM, and the tag store error occurs for that transaction, the write-unlock 
is sent to memory. 

Recovery procedure (all cases): Clear BCETSTS<LOCK>. If it is an addressing error, clear 
BCETSTS<BAD_ADDR>. Otherwise, clear BCETSTS<UNCORR>. 

15.8.1.3.1 Case: BCETSTS<TS_CMD>=WUNLOCK 

Recovery procedure: Write a INVALID tag with good ECC to the tag with the error (using the 
BCTAG access path). Then flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 
Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of 
unrecoverable errors. 

Restart conditions: 

The Bcache was in ETM at the time the write-unlock arrived. The data is in memory may be 
corrupt and memory's ownership bit was cleared. Memory is corrupted at the location indicated 
by S_BCETIDX. Software must determine if the error is fatal to one process or the whole system 
and take appropriate action. 

15.8.1.3.2 Case: BCETSTS<TS_CMD>=DREAD,IREAD,OREAD 

Recovery procedure: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 
Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of 
unrecoverable errors. After flushing the Bcache, it is necessary to determine if any block is "lost". 
If a block's memory ownership bit is set and no writeback cache in the system has it owned, then 
the block is said to be lost. Use the procedure in Section 15.3.3.1.2.5. This procedure can result 
in finding no lost blocks, one lost block, or multiple lost blocks. 

Restart conditions: If there is one lost block, it is not recoverable. Software must if the lost 
data was fatal to one process or the whole system and take appropriate action. 

If multiple blocks are lost (this isn't expected), crash the system. 
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1 5.8.1 .3.3 Case: BCETSTS<TS_CMD>=R JNVAL,OJNVAL,IPR_DEALLOCATE 

Recovery procedure: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 
Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of 
unrecoverable errors. After flushing the Bcache, it is necessary to determine if any block is "lost". 
If a block's memory ownership bit is set and no writeback cache in the system has it owned, then 
the block is said to be lost. Use the procedure in Section 15.3.3.1.2.5. This procedure can result 
in finding no lost blocks, one lost block, or multiple lost blocks. 

If exactly one block is lost, memory's owner ID information indicates this CPU, write a 
VALID-OWNED tag with the address of the lost block into the tag which had the error (using 
the BCTAG access means). Then flush this location to memory. An error could occur with this 
flush, in which case the data is not recoverable. 

NOTE 

If memory does not store an owner ID with each block in a particular system, then this 
recovery method is not recommended. Instead, the data should be considered lost. 

Restart conditions: If there is one lost block, and the repair procedure didn't incur an error, 
restart. 

If the repair procedure was not successful, the data is not recoverable. Software must if the lost 
data was fatal to one process or the whole system and take appropriate action. 

If multiple blocks are lost (this shouldn't result from one tag store error), crash the system. 

15.8.1.4 Lost Bcache Tag Store Errors 

Some number of unrecoverable Bcache tag store errors occurred and were not latched because 
BCETSTS already contained a report of an unrecoverable error. All unrecoverable tag store errors 
cause soft error interrupt, so this is definitely a cause of the soft error interrupt. 

" Lost Bcache tag store errors may be caused by more than one operand prefetch to the same cache 
block. 

The Bcache is in ETM. 

Unrecoverable tag store errors can cause lost data by overwriting blocks in the Bcache. 

Unrecoverable tag store errors in ETM on write-unlocks can cause corrupted memory data. 

Recovery procedure: Clear BCETSTS<LOSTJERR>. Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). Software should prepare for another tag error 
during the Bcache flush by clearing BCETSTS of unrecoverable errors. 

Restart conditions: Lost write-unlock errors may have corrupted memory. Crash the system. 

15.8.1.5 Bcache Tag Store Correctable ECC errors 

Description: A correctable error occurred in accessing the Bcache tag store. The Bcache is 
not in ETM. S_BCETIDX contains the physical address of the error. (If the physical address 
is found to be in 10 space, it is an inconsistent status. See Section 15.8.1.22.) (The index 
portion of SJBCETTDX indicates which tag store entry had the error.) SJBCETAG contains the 
actual tag data and check bits read during the failing access. Software may use the routine 
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TAG_ECC_CHECK in Section 15.10 to check the tag data and determine the syndrome. The 
result of this check should be a correctable single-bit error. 

Recovery procedures: Clear BCETSTS<CORR>. 

If the operation was anything but a tag lookup for an explicit IPR deallocate operation (i.e., 
BCFLUSH), software should flush that one location by writing the BCFLUSH IPR. 

TBS (MTPR to (BCFLUSH + (S_BCETIDX & INDEX_MASK) ) ) 

This effectively scrubs the Bcache tag store location by invalidating it and forcing it to be written 
back if it is owned. This may be done without putting the Bcache in software ETM. 

15.8.1.6 Lost Bcache Tag Store Correctable ECC errors 

Description: A correctable error occurred in accessing the Bcache tag store, but it is lost because 
of an uncorrectable tag store error which also occurred. 

Recovery procedures: Clear BCETSTS<CORR>. 

The Bcache should be flushed (and it would be because of the uncorrectable error in any case). 
This effectively scrubs the Bcache tag store location by invalidating it. 

15.8.1.7 Bcache Data RAM Correctable ECC Errors 

Description: A correctable error occurred in accessing the Bcache data RAM. The Bcache is 
not in ETM. S_BCEDIDX contains the cache index of the error, and S_BCEDECC contains the 
syndrome calculated by the ECC logic. It is not possible to reliably determine the physical address 
of the error, since the Bcache is not in ETM and therefore the block can be overwritten at any 
time after the error. 

Recovery procedures: Clear BCEDSTS<CORR>. 

If the operation was a read (S_BCEDSTS<DR_CMD>=DREAD or IREAD), rftware should flush 
that one location using the BCFLUSH IPR. 

TBS (MTPR to (BCFLUSH_BASE + (BCEDIDX i INDEX_MASK) ) ) 

This effectively scrubs the Bcache data RAM location by invalidating it and forcing it to be written 
back if it is owned. This may be done without putting the Bcache in software ETM. 

15.8.1.8 Lost Bcache Data RAM Correctable ECC Errors 

Description: A correctable error occurred in accessing the Bcache data RAM, but it is lost 
because of an uncorrectable data RAM error which also occurred. The address and syndrome of 
the error are not known. 

Recovery procedures: Clear BCEDSTS<CORR>. 

The Bcache should be flushed (and it would be because of the uncorrectable error in any case). 
This effectively scrubs the Bcache data RAM location by invalidating it and forcing it to be written 
back if it is owned. 
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15.8.1.9 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on l-Stream or 
D-Stream Reads 

Description (addressing error): A Bcache addressing error was detected by the Cbox in an 
I- stream or D- stream read during a Bcache hit. Addressing errors are the result of a mismatch 
between the address the Cbox drives to the RAMs for a read access and the address used to write 
that location. A multiple bit data error can appear to be addressing error, though it is extremely 
unlikely. 

Description (uncorrectable ECC error): A Bcache uncorrectable ECC error was detected by 
the Cbox in an I-stream or D-stream read during a Bcache hit. Uncorrectable data errors are the 
result of a multiple bit error in the data read from the Bcache. An addressing error with a single 
bit data error will appear as an uncorrectable data error. 

Description (both cases): The Bcache in in ETM. S_BCEDIDX contains the cache index of 
the error, and S_BCEDECC contains the syndrome calculated by the ECC logic. The physical 
address of the reference can be found by reading the tag for the data block (using the procedure 
in Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent 
status. See Section 15.8.1.22.) 

If the block's tag is found to contain an ECC error, then the address can not be determined. 

It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> 
are set. If they are, it is an inconsistent status (see Section 15.8.1.22). 

Recovery procedures: To recover, clear BCEDSTS<LOCK>. Also, if it is an addressing error, 
clear B CEDSTS<BAD_ADDR>. Otherwise, clear BCEDSTS<UNCORR>. 

Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). If the data is owned by 
the Bcache and if the error repeats itself (is not transient), then a writeback error will result 
from the flush procedure. Software should prepare for this by clearing NESTS and BCEDSTS 
errors. 

Restart Conditions: If a writeback error occurs in the Bcache flush, then the data is presumed 
to be unrecoverable. See the next section for a description of handling an error in a writeback. 
Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. 

If the address of the error in the flush is not the same as that of the original error, this is a 
multiple error case in the data RAMs and is a serious failure. Crash the system. 

15.8.1.10 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on Writebacks 

Description (addressing error): A Bcache addressing error was detected by the Cbox in an 
writeback. Addressing errors are the result of a mismatch between the address the Cbox drives 
to the RAMs for a read access and the address used to write that location. A multiple bit data 
error can appear to be addressing error, though it is extremely unlikely. The NDAL WD ATA 
cycle was converted to a BADWDATA cycle. Memory should have tagged the location as bad and 
unreadable by an implementation specific mechanism. 

Description (uncorrectable ECC error): A Bcache uncorrectable ECC error was detected by 
the Cbox in an writeback. Uncorrectable data errors are the result of a multiple bit error in 
the data read from the Bcache. An addressing error with a single bit data error will appear as 
an uncorrectable data error. The NDAL WDATA cycle was converted to a BADWDATA cycle. 
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Memory should have tagged the location as bad and unreadable by an implementation specific 
mechanism. 

Description (both cases): The Bcache in in ETM. S_NESTS<BADWDATA> should be set. If 
it isn't, and S_NESTS<LOST_OERR> and SJSTESTS<NOACK> aren't set, then the writeback 
which incurred the error is still in the writeback queue in the BIU. Software should force the 
writeback queue to be drained (causing the second error event to occur) by reading from the CWB 
register. 

MFPR #PR19$_CWB,R0 

After this, NESTS, NEOADR, and NEOCMD should be captured again. 

If S_NESTS<BADWDATA> is set, then S_NEOADR contains the physical address of the lost 
writeback data. (If the physical address is found to be in 10 space, it is an inconsistent status. 
See Section 15.8.1.22.) 

If S_NESTS<BADWDATA> isn't set but S_NESTS<LOST_OERR> is, then the address of the lost 
writeback data is not available. 

If after draining the writeback queue, S_NESTS<BADWDATA> isn't set, then an inconsistency 
exists (see Section 15.8.1.22). 

It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> 
are set. If they are, it is an inconsistent status (see Section 15.8.1.22). 

Recovery procedures: To recover, clear BCEDSTS<LOCK> and NESTS<BADWDATA>, 
if it is set. If it is an addressing error, clear BCEDSTS<BAD_ADDR>, otherwise clear 
BCEDSTS<UNCORR>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 
Then use the system specific memory repair procedure to undo the tagged-bad data in memory 
(see Section 15.3.3.1.2.2.3). 

NOTE 

When clearing the tagged-bad data state of memory, software must first ensure that no 
more accesses to the block can occur. Otherwise there is the danger that some process 
on some other processor or a DMA 10 device will see incorrect data and not detect an 
error. 

Restart Conditions: The data is lost, software must determine if the error is fatal to one 
process or the whole system and take appropriate action. If the address of the lost data could not 
be determined, crash the system. 

15.8.1.11 Lost Bcache Data RAM Errors With Possible Lost Writebacks 

Description: Lost Bcache data RAM errors which cause only a soft error interrupt (when 
S_NESTS indicates the possibility of a lost writeback error) indicate that data errors occurred 
on reads or writebacks, but no new write data was lost. S_NESTS reports the writeback error, 
unless multiple NDAL output errors have occurred. 

The Bcache in in ETM. 

Lost Bcache data RAM errors of this kind can be caused by an operand prefetch from a Bcache 
block followed by a write to the same block. 
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If S_NESTS<BADWDATA> is set, then S.NEOADR contains the physical address of a writeback. 
(If the physical address is found to be in 10 space, it is an inconsistent status. See 
Section 15.8.1.22.) 

Recovery procedures: Tb recover, clear BCEDSTS<LOST_OERR>. Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). Writeback errors may occur during the flush. 
Software should prepare for this by clearing NESTS and BCEDSTS errors. 

If S_NESTS<BADWDATA> is set, clear NESTS<BADWDATA>. Use the system specific memory 
repair procedure to undo the tagged-bad data in memory (see Section 15.3.3.1.2.2.3) (the Bcache 
must be flushed before this repair procedure). 

NOTE 

When clearing the tagged-bad data state of memory, software must first ensure that no 
more accesses to the block can occur. Otherwise there is the danger that some process 
on some other processor or a DMA 10 device will see incorrect data and not detect an 
error. 

Restart condition (S_NESTS<LOST_OERR> set): There is no way to determine how many 
writebacks failed. They all should have gone to memory with BADWDATA cycles, where memory 
would have them marked as tagged-bad data. So an unknown block may be tagged-bad in memory. 
If so, the next access to that block could come from the system itself, even if it "belonged" only to 
one process. This will cause the system to crash. But there is a chance that the next access will 
come from a user process. This would allow the system to stay up, though that process would 
have to be deleted. 

If the system's implementation of tagged-bad data is not reliable (see Section 15.11.1, Note On 
Tagged-Bad Data Mechanisms), software should crash the system. If it is reliable, restart. 

Restart condition (S_NESTS<LOST_OERR> not set): 

The writeback data is lost but the address is known. Software must determine if the error is 
fatal to one process or the whole system and take appropriate action. 

15.8.1.12 Lost Bcache Data RAM Errors Without Lost Writebacks 

Description: Lost Bcache data RAM errors which cause only a soft error interrupt (when 
S_NESTS indicates no possibility of writeback error) indicate that data errors occurred on reads. 
No write data was lost. 

Lost Bcache data RAM errors may be caused by more than one operand prefetch to the same 
cache block. 

The Bcache in in ETM. 

Recovery procedures: To recover, clear BCEDSTS<LOST_OERR>. Flush the Bcache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). Writeback errors may occur during the flush. 
Software should prepare for this by clearing NESTS and BCEDSTS errors. 

Restart condition: Only reads from the Bcache failed. Restart is possible unless any error 
encountered during Bcache flush is fatal. 
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15.8.1.13 NDAL l-Stream or D-Stream Read or D-Stream Ownership Read Timeout Errors 

Description: An I-stream or D-stream read or D-stream ownership read timed out before all 
the fill quadwords were received. This is not an accepted means for a system environment 
to notify the NVAX CPU of "non-existent memory or 10 location". The error could be caused 
by an error in the system environment or an NDAL parity error on the returned data. It 
also could be caused by some previous error in the system environment or this CPU which 
leaves a cache block marked as owned in memory and not marked as owned in any cache in 
the system. S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space. If the address 
is in memory space, S_CEFSTS<COUNT> indicates the number of quadwords received.) The 
physical address is in S_CEFADR. 

I-stream or D-stream read: The Bcache is not in ETM. 

D-stream ownership read: The Bcache is in ETM. No write data has been merged with the 
returning fills. 

The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). 

If the ownership read was for an Mbox write, the write was sent on the NDAL after the OREAD 
timed out. 

If the ownership read was for a read-lock, the corresponding write-unlock should have been 
received from the Ebox. The write-unlock is sent as a quadword WDISOWN by the Cbox, so no 
memory location is left owned. (If the error was on the requested quadword, a machine check 
would definitely have resulted. If a separate error prevents the write-unlock, that will be reported 
either in other error registers.) 

Recovery procedures (all cases): Clear CEFSTS<LOCK, TTMEOUTx 

Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> 
set): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 

Depending on the system environment, memory may have set its ownership bit for this block. If 
so the write data must have been lost, and a hard error interrupt is expected. Use the system 
dependent procedure for reseting the ownership bit in memory. 

If memory would not have set its ownership bit for this block, memory's state may be correct and 
up to date. 

Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> not 
set): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 

Depending on the system environment, memory may have set its ownership bit for this block. 
The data in memory is presumably still good. Hie Bcache block is marked invalid in the Bcache 
tag store. However, if the error occurred on a read-lock, the corresponding write-unlock should 
have occurred and it will have cleared the ownership bit for this block. 

If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. 
In general, it is not possible to determine which quadwords are valid. However, if 
S_CEFSTS<REQ_FILL_DONE> is set, then the quadword in the Bcache block pointed to by 
S_CEFADR is valid (except in the case of a read-lock, but the data shouldn't be needed for 
memory repair in that case). 
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If S_CEFSTS<COUNT> is greater than 0, and the address in S.CEFADR is not in 10 space, 
then the block was not owned before the operation began. In this case, use the system dependent 
procedures (see Section 15.3.3.1.2.2.1) to determine if memory's ownership bit is set and this CPU 
owns the block. If so, use the system specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In 
some systems (the XMI2 for example) this may require a quadword of correct data be written to 
memory to reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data 
from the Bcache data RAMs in this case. 

If memory's ownership bit was left set as a result of this error and no non-destructive procedure 
exists for restoring it, then the hexaword block is lost. 

Restart condition: Restart if the memory state repair procedure is successful or no repair is 
called for, no data is lost, and the address is not in 10 space. If the hexaword block could not be 
repaired or data is lost, software must determine if the error is fatal to one process or the whole 
system and take appropriate action. 

Post Restart Recovery: If the same fill error recurs on restart, then the block is probably 
"lost". 1 Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 

NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then restart once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the restart actually 
occurs, fortuitously repairing the cause of the fill error. 

15.8.1.14 NDAL l-Stream or D-Stream Read or D-Stream Ownership Read Data Errors 

Description: An I- stream or D-stream read or D-stream ownership read terminated with an RDE 
(read data error) NDAL cycle before all the fill quadwords were received. If S_CEFSTS<COUNT> 
is 0 or the address is an 10 space address, this is an accepted means for a system environment 
to notify the NVAX CPU of "non-existent memory or 10 location". Otherwise, the error could be 
caused by an error in the system environment. It also could be caused by some previous error 
in the system environment or this CPU which leaves a cache block marked as owned in memory 
and not marked as owned in any cache in the system. 

S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) 

In any case, the physical address is in S_CEFADR. 

I-stream or D-stream reach The Bcache is not in ETM. 

D-stream ownership read: The Bcache is in ETM. No write data has been merged with the 
returning fills. 



1 In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). 

If the ownership read was for an Mbox write, the write was sent on the NDAL after the OREAD 
was aborted. 

If the ownership read was for a read-lock, the corresponding write-unlock should have been 
received from the Ebox. The write-unlock is sent as a quadword "WDISOWN by the Cbox, so no 
memory location is left owned. (If the error was on the requested quadword, a machine check 
would definitely have resulted. If a separate error causes prevent the write-unlock, that will be 
reported either in other error registers.) 

Recovery procedures (all cases): Clear CEFSTS<LOCK, RDE>. 

Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> 
set): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 

Depending on the system environment, memory may have set its ownership bit for this block. If 
so the write data must have been lost, and a hard error interrupt is expected. Use the system 
dependent procedure for reseting the ownership bit in memory. 

If memory would not have set its ownership bit for this block, memory's state may be correct and 
up to date. 

Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> not 
set): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). 

Depending on the system environment, memory may have set its ownership bit for this block. 
The data in memory could still be good. The Bcache block is marked invalid in the Bcache tag 
store. However, if the error occurred on a read-lock, the corresponding write-unlock should have 
occurred and it will have cleared the ownership bit for this block. 

If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. 
In general, it is not possible to determine which quadwords are valid. However, if 
S_CEFSTS<REQ_FILL_DONE> is set, then the qua '-^ord in the Bcache block pointed to by 
S_CEFADR is valid (except in the case of a read-lock, but the data shouldn't be needed for 
memory repair in that case). 

If S_CEFSTS<COUNT> is greater than 0, and the address in S.CEFADR is not in 10 space, 
then the block was not owned before the operation began. In this case, use the procedures 
in Section 15.3.3.1.2.2 to determine if memory's ownership bit is set. If so, use the system 
specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In some systems (the XMI2 for example) 
this may require a quadword of correct data be written to memory to reset the ownership bit. 
Section 15.3.3.1.2.3 describes procedures for extracting data from the Bcache data RAMs in this 
case. 

If memory's ownership bit was left set as a result of this error and no non-destructive procedure 
exists for restoring it, then the hexaword block is lost. 

Restart condition: Restart if the memory state repair procedure is successful or no repair is 
called for, no data is lost, and the address is not in 10 space. If the hexaword block could not be 
repaired or data is lost, software must determine if the error is fatal to one process or the whole 
system and take appropriate action. 



1 5-78 Error Handling 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



Post Restart Recovery: If the same fill error recurs on restart, then the block is probably 
"lost". 1 Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 

NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then restart once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the restart actually 
occurs, fortuitously repairing the cause of the fill error. 

15.8.1.15 Lost Bcache Fill Error 

Description: Some number of fill errors occurred and were not latched because CEFSTS and 
CEFADR already contained a report of an unrecoverable error. Lost Bcache fill errors which do 
not cause hard error interrupts are always read errors. 

Lost Bcache fill errors may be caused by more than one operand prefetch to the same cache block. 

Lost Bcache fill errors may leave blocks marked owned by this CPU in memory without the 
Bcache actually owning the block. 

The Bcache may be in ETM. Read S_CCTL<HW_ETM> to find out. 

Recovery procedures: Clear CEFSTS<LOST_ERR>. If the Bcache is in ETM, flush the Bcache 
and clear CCTL<HW_ETM> (in that order). 

Restart condition: Lost Bcache fill errors may leave blocks marked owned by this CPU in 
memory without the Bcache actually owning the block In systems where the ownership bits are 
very reliably maintained (see Section 15.11.2, Note On Ownership Mechanism), restart. 

In systems where the ownership bits are not very reliably maintained, crash the system. 

15.8.1.16 Unacknowledged NDAL l-Stream or D-Stream Read or D-Stream Ownership Read 

Description: An I-stream or D-stream read or D-stream ownership read was no-ACKed by the 
system environment. This could be because the external components) received bad NDAL parity 
or it could be due to a system-specific notification of "non-existent memory or 10 location". The 
physical address is in S_CEFADR. 

I-stream or D-stream read: The Bcache is not in ETM. 
D-stream ownership read: The Bcache is in ETM. 

The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). 



In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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If the ownership read was for an Mbox write, the write was sent on the NDAL after the OREAD 
timed out. If the write was also no-ACKed, a hard error interrupt would have been posted. That 
is handled as a separate error. 

Recovery procedures (all cases): Clear NESTS<NOACK>. 

Additional Recovery procedure for D -stream ownership read: Flush the B cache. Clear 
CCTL<HW_ETM> (after flushing the Bcache). No error is expected during the Bcache flush. 

1 5.8.1 .1 7 Lost NDAL Output Error 

Description: Some number of NDAL output errors occurred. Some number of read no-ACKs 
and/or BADWDATAs were missed. Hard error interrupt would have occurred if a write or 
writeback was no-ACKed. 

Lost NDAL output errors may be caused by more than one operand prefetch to the same cache 
block. 

The Bcache may be in ETM. read S_CCTL<HW_ETM> to find out. 

Recovery procedure: Clear NESTS<LOST_OERR>. If CCTL<HW_ETM> is set, flush the 
Bcache and clear CCTL<HW_ETM> (in that order). 

Restart conditions: Lost NDAL output errors may leave tagged bad locations in memory. In 
systems where the method of implementing tagged-bad data is reliable (see Section 15.11.1, Note 
On Tagged-Bad Data Mechanisms), restart. 

If a tagged-bad block is not reliable in the particular system, crash the system. 

15.8.1.18 PTE read errors 

The following sections describe error handling for PTE read errors. PTE read errors are read 
errors which happen in reads issued by the Mbox in handling a TB miss. Handling of these errors 
is different from handling the same underlying error (Bcache data RAM error, Bcache fill error, 
or NDAL no-ACK error) when PTE read isn't the cause. 

If S_PCSTS<PTE_ER> is set, then a PTE read issued by the Mbox in processing a TB miss had 
an unrecoverable error. The TB miss sequence was aborted because of the error. The original 
reference can be any I-stream or D-stream read or write. 

PTE read errors are difficult to analyze, partly because the read error report in the Cbox does 
not directly indicate that the failing read was a PTE read. Because of this and because PTE read 
errors should be rare (a very small percentage of the reads issued by the Mbox are PTE reads), 
multiple errors which interfere with the analysis of the PTE error are not considered recoverable. 

If the reference which incurs the PTE read error is a write, S_PCSTS<PTE_ER_WR> will be set. 
In this case the original write is lost. No retry is possible partly because the instruction which 
took the machine check may be subsequent to the one which issued the failing write. Also, PTE 
read errors on write transactions can cause a machine check at a practically arbitrary time in a 
microcode flow, and core machine state may not be consistent. 



15-80 Error Handling 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



15.8.1.18.1 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on PTE Reads 

Description (addressing errors): A Bcache addressing error was detected by the Cbox in a PTE 
read during a Bcache hit. Addressing errors are the result of a mismatch between the address 
the Cbox chives to the RAMs for a read access and the address used to write that location. A 
multiple bit data error can appear to be addressing error, though it is extremely unlikely. 

Description (uncorrectable ECC errors): A Bcache uncorrectable data error was detected 
by the Cbox in a PTE read during a Bcache hit. Uncorrectable data errors are the result of a 
multiple bit error in the data read from the Bcache. An addressing error with a single bit data 
error will appear as an uncorrectable data error. 

Description (all cases): The Bcache in in ETM. SJBCEDIDX contains the cache index of the 
error, and S_BCEDECC contains the syndrome calculated by the ECC logic. The physical address 
of the PTE read can be found by reading the tag for the data block (using the procedure in 
Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent 
status. See Section 15.8.1.22.) 

If the block's tag is found to contain an ECC error, then the address can not be determined. 

S_BCEDSTS<LOST_ERR> may be set. This lost error is probably due to the same PTE error 
occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs 
after handling this error. 

It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> 
are set. If they are, it is an inconsistent status (Section 15.5.2.7). 

Recovery procedures (addressing errors): To recover, clear BCEDSTS<LOCK, BAD_ADDR>. 

Recovery procedures (uncorrectable ECC errors): To recover, clear BCEDSTS<LOCK, 
UNCORR>. 

Recovery procedures (both cases): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing 
the Bcache). Clear PCSTS<PTE_ER>. If the data is owned by the Bcache and if the error repeats 
itself (is not transient), then a writeback error will result from the flush procedure. Software 
should prepare for this by clearing NESTS and BCEDSTS errors. 

Restart condition: If no writeback error occurs in the Bcache flush, restart if: 

(S_PCSTS<PTE_ER_WR> = 0). 

If 

(S_PCSTS<PTE_ER_WR> = 1), 

crash the system. 

If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See 
Section 15.8.1.10 for a description of handling an error in a writeback (software must determine 
if the error is fatal to one process or the whole system and take appropriate action). 
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15.8.1.18.2 NDAL PTE Read Timeout Errors 

Description: A PTE read timed out before any fill quadword was received. This is not an 
accepted means for a system environment to notify the NVAX CPU of "non-existent memory or 
10 location". The error could be caused by an error in the system environment or an NDAL 
parity error on the returned data. It also could be caused by some previous error in the system 
environment or this CPU which leaves a cache block marked as owned in memory and not marked 
as owned in any cache in the system. 

S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in IO space.) The physical 
address is hi S_CEFADR. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). 

The physical address of the PTE is in S.CEFADR. The Bcache is not in ETM. The read could not 
have been an ownership read, so this error can not have caused the ownership bits in memory to 
be left in the wrong state. 

S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Recovery procedures: Clear CEFSTS<LOCK, TIMEOUT>. Clear PCSTS<PTEJER>. 
Restart condition: Restart if: 

(S_PCSTS<PTE_ER_WR> = 0). 

Otherwise, crash the system. 

Post Restart Recovery: If the same fill error recurs on restart, then the block is probably 
"lost". 1 Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 

NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then restart once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the restart actually 
occurs, fortuitously repairing the cause of the fill error. 



In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 
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15.8.1.18.3 NDAL PTE Read Data Errors 

Description: A PTE read ended with an RDE (read data error) NDAL cycle before any the fill 
quadwords were received. This is an accepted means for a system environment to notify the 
NVAX CPU of "non-existent memory or 10 location". Otherwise, the error could be caused by an 
error in the system environment. It also could be caused by some previous error in the system 
environment or this CPU which leaves a cache block marked as owned in memory and not marked 
as owned in any cache in the system. 

S_CEFSTS<COUNT> indicates the number of quadwords received before the error. 
(S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical 
address is in S_CEFADR. 

CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). 

The physical address of the PTE is in S_CEFADR. The Bcache is not in ETM. The read could not 
have been an ownership read, so this error can not have caused the ownership bits in memory to 
be left in the wrong state. 

S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Recovery procedures: Clear CEFSTS<LOCK, RDE>. Clear PCSTS<PTE_ER>. 
Restart condition: Restart if: 

(S_PCSTS<PTE_ER_WR> = 0). 

Otherwise, crash the system. 

Post Restart Recovery: If the same fill error recurs on restart, then the block is probably 
"lost". 1 Software must determine if the error is fatal to one process or the whole system and take 
appropriate action. (If it is fatal only to one process, use the system dependent procedure for 
reseting memory's ownership bit.) 

NOTE 

It may be appropriate in this case to first cause each CPU in the system to flush its 
Bcache, and then restart once more. 

NOTE 

It may be that another error (such as an uncorrectable tag store error on a coherence 
request) will be repaired by the soft error interrupt handler before the restart actually 
occurs, fortuitously repairing the cause of the fill error. 



In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the 
data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory 
"thinks'' owns the data, but it is often not possible to determine which error caused this situation to arise. 
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15.8.1.18.4 Unacknowledged NDAL PTE Read 

Description: A PTE read was no-ACKed by the system environment. This could be because the 
external components) received bad NDAL parity or it could be due to a system-specinc notification 
of "non-existent memory or 10 location". 

The physical address of the PTE is in S_NEOADR. The Bcache is not in ETM. 

S_CEFSTS<LOST_OERR> maybe set. This error is probably due to the same PTE error occurring 
more than once. This is an acceptable assumption unless a hard error interrupt occurs after 
handling this error. 

Recovery procedures: Clear NESTS<NOACK>. Clear PCSTS<PTE_ER>. 
Restart condition: Restart if: 

(PCSTS<PTE_ER_WR> = 0). 

Otherwise, crash the system. 

15.8.1.18.5 Multiple Errors Which interfere with Analysis of PTE Read Error 

Because PTE read errors lead to several unusual cases, restart is not recommended in the event 
that other errors cloud the analysis of the PTE read error. 

Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. 
Recovery procedures: No specific recovery action is called for. 
Restart condition: No restart is possible. Crash the system. 

1 5.8.1 .1 9 NDAL Parity Errors 

Description: A cycle with a parity error was received by the NVAX CPU chip from the NDAL. If 
it is an inconsistent parity error, another node acknowledged the transaction despite the parity 
error seen by the NVAX chip. The Bcache is in ETM. The Bcache is coherent with memory 
because it only accesses VALID-OWNED locations in the Bcache data RAMs once in ETM. Some 
other node's request may timeout because the Cbox missed a coherency request for writeback. 
The Pcache may now be incoherent since an NDAL write to a Bcache VALID-UNOWNED location 
may have been missed. 

In some systems (e.g., OMEGA), a no-ACK on an NDAL command implies no effect from that 
command took place. This makes NDAL parity errors very recoverable. In other systems (e.g., 
XMI2), a no-ACK on an NDAL command does not imply this (for invalidates forwarded from the 
XMI2 bus), and all parity errors imply possible lost invalidates and incoherent Pcache. 

Recovery procedure: Clear NESTS<PERR> and NESTS<INCON_PERR>. Flush the Bcache. 
Clear CCTL<HW_ETM> (after flushing the Bcache). 

Restart condition: If no-ACK in the specific system implies a command was not effective, and 
if the error was not an inconsistent parity error, restart. Otherwise, It isn't possible to determine 
whether the interrupted instruction stream may have seem the effect of out of order writes 
because of the Pcache missing an invalidate. Crash the system. 
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1 5.8. 1 .20 Lost Parity Errors 

Description: Some number of cycles with parity errors were received by the NVAX CPU chip 
from the NDAL. Some may have been inconsistent parity errors. The Bcache is in ETM. The 
Bcache is coherent with memory because it only accesses VALID-OWNED locations in the Bcache 
data RAMs once in ETM. Some other node may timeout because the Cbox missed a coherency 
request for writeback. The Pcache may now be incoherent since an NDAL write to a Bcache 
VALID-UNOWNED location may have been missed. 

Recovery procedure: Clear NESTS<LOST_PERR>. Flush the Bcache. Clear 
CCTL<HWJETM> (after flushing the Bcache). 

Restart condition: It isn't possible to determine whether the interrupted instruction stream 
may have seem the effect of out of order writes because of the Pcache missing an invalidate. 
Crash the system. 

15.8.1.21 System Environment Soft Error Interrupts 

Description: Errors which occur in the system environment and do not result in loss of data or 
which can notify the NVAX CPU by returning RDE also notify the CPU of the error by asserting 
S_ERRJL (e.g., read errors). Errors which are corrected automatically by hardware and do not 
result in loss of data should use soft error interrupt notification. 

NOTE 

It is important that components in the system environment which assert S_ERR_L 
have a CPU accessible register which reports the SJERRJL assertion. 

Attention should be given to the robustness tagged-bad data schemes. If error detection 
for these schemes is good enough, then error recovery may be able to ignore lost soft 
errors. Lost soft errors are very possible in NVAX systems because the first error doesn't 
normally prevent NVAX from continuing to issue new requests (sue to macropipeHning). 

Similarly, good error detection schemes on the ownership bits in memory may facilitate 
recovery from lost soft errors. 

It is also recommended that an address be stored where applicable. They allow software 
to do improve the systems chance of surviving an error event without crashing by 
cleaning up tagged-bad locations and the like. For example, a write timeout clearing a 
page in the VMS page handler may be unrecoverable, while clearing that tagged-bad 
data location before it ever got to the page handler might be quite recoverable. 

Recovery procedures: Clear the error status bits in the system registers and perform any 
necessary system dependent recovery procedure. 

Restart condition: Typically, restart is possible, though in cases where data is lost software 
may have to kill one process or crash the system. 

15.8.1.22 Inconsistent Status in Soft Error interrupt Analysis 

Description: A presumed impossible error report was found in the error registers. This could 
be due to a hardware failure or bug. 

Recovery procedures: No specific recovery action is called for. 
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Restart condition: No restart is possible. The integrity of the entire system is questionable. 
Crash the system. 

NOTE 

This status can result if machine check occurs. Software may employ some mechanism 
for deterniining that this occurred, but it must be sure that mechanism can't ever falsely 
indicate that an inconsistent status is acceptable. Inconsistent status is a serious 
problem and should not be ignored. 
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15.9 Kernel Stack Not Valid Exception 

A Kernel Stack Not Valid Exception occurs when a memory management exception is detected 
while attempting to push information on the kernel stack during microcode processing of another 
exception. Note that a console halt with an error code of ERR_INTSTK is taken if a memory 
management exception is encountered while attempting to push information on the interrupt 
stack. 

The Kernel Stack Not Valid exception is dispatched through SCB vector 08 (hex) with the stack 
frame shown in Figure 15—11. 

Figure 15-11: Kernel Stack Not Valid Stack Frame 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16 | 15 14 13 12111 10 OS 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
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15.10 Error Recovery Coding Examples 

To be supplied. 

15.11 Miscellaneous Background Information 

This section contains miscellaneous background information relevant to this error handling 
chapter. 

15.11.1 Note On Tagged-Bad Data Mechanisms 

Writebacks which are sent as BADWDATA are supposed to appear as tagged-bad data in memory, 
and further reads to that block should fail. In some systems, tagged bad data is implemented by 
a mechanism as reliable as that used to store data. In at least one system (OMEGA), tagged-bad 
data is implemented by altering the ECC code of the data as it is written. Some single-bit and 
many double-bit errors in this data can make it appear to be correctable or correct when read. 
This is less protection from error than valid data has. In such a system, an error which results 
in a lost tagged-bad-data block is reason to crash the system. In systems with reliable storage 
of "tagged-bad-data", operation can continue after such an error because it is essentially certain 
that any process which accesses that data will see an RDE error for that block and will machine 
check before it uses the bad data. 

The Bcache data RAMs in NVAX use the above relatively unreliable mechanism for tagged-bad 
data. Three ECC check bits are nipped in the stored value. This mechanism would often prevent 
a subsequent read from succeeding, but it is not sufficiently reliable to allow missing tagged-bad 
blocks in the Bcache to be tolerated. As a result, all errors which may have left a tagged-bad 
block in the Bcache without some error address register pointing it out are cause to crash the 
system. 

15.11.2 Note On Ownership Mechanism 

In the absence of additional errors, the memory/cache ownership mechanism ensures that no 
other process can access the block whose ownership bit is set in memory and is not owned by any 
cache. Cache coherence in the system depends on this mechanism. In some systems, memory 
error detection and correction for ownership bits is as reliable as for data. This is true of XMI2 
based systems. However, in some systems the mechanism is less reliable. One example is the 
OMEGA system, where the ownership bits are stored with a single-bit-error-detect-and-correct 
scheme which can not detect most double bit errors and therefore interprets most double bit errors 
as correctable single bit errors. In such a system, error situations in which unknown blocks in 
memory may be owned should be taken as a system crash. 

In OMEGA, there is a proposal make up for the non-robust ownership bit error detection scheme 
by flushing the cache on every "correctable" ownership bit error in the NMC. If the "correctable" 
error really is an uncorrectable error, this may be detected by a WDISOWN to an unowned 
memory location. This is because some uncorrectable errors are seen as correctable errors, so 
one ownership bit is flipped by memory's error correction hardware and at least two bits were 
wrong to start with. There is a chance that the "correction" flips one of the bad bits, but it could 
also flip one of the remaining correct bits. This leaves the memory with one or three incorrect 
ownership bits after an uncorrectable "correctable" error. If every cache is flushed immediately 
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after a "correctable" error, then writebacks to apparently unowned locations may result if the 
error is inadvertently made worse by the correction scheme. These are detectable protocol errors 
and should lead to a system crash. If the effect of the error correction was to mark block(s) as 
owned when no cache owns them, then eventually some process will attempt to access that data 
and time out. If the error was successfully corrected, then flushing the caches causes a pause 
in processing and no bad effects. If these errors are infrequent, this seems an acceptable loss in 
performance in exchange for increased reliability. 
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15.12 Revision History 



Table 15-6: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


06-Mar-1989 


Release for external review. 


Mike Uhler 


19-Dec-1989 


Update for second-pass release. 


John 


12-Feb-1990 


Update with error handling information. 


Edmondson 






John 


30-Jun-1990 


Update further after internal review and resolution of many issues. 


Edmondson 






John 


31-May-1991 


Minor updates for pass 2 changes. 


Edmondson 
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Chapter 16 
Chip Initialization 



16.1 Overview 

This chapter describes the hardware initialization process for the NVAX CPU chip. The hardware 
and microcode start the initialization, and then pass control to the console macrocode at address 
E0040000 for further initialization. 

Much of the job of initialization involves setting the NVAX internal processor registers (IPRs) 
to a known state, or using NVAX IPRs to perform functions such as cache initialization. See 
Chapter 2 for a list of the NVAX IPRs. Also, see the individual box chapters for a more in depth 
definition of many of the IPRs. 

16.2 Hardware/Microcode initialization 

The NVAX Chip hardware initializes to the following state on powerup or the assertion of chip 
reset: 

1. The VIC, Peache, and Bcache are disabled. 

2. The RLOG is cleared. 

3. The Fbox and vector unit are disabled. 

4. The microstack is cleared. 

5. The Mbox and Cbox are reset, and all previous operations are flushed. 

6. Hie Fbox is reset. 

7. The Ibox is stopped, waiting for a LOAD PC. 

8. All instruction and operand queues are flushed. 

9. All MD valid bits are cleared, and all Wn valid bits are set. 

10. A powerup microtrap is initiated which starts the Ebox at the label IE .POWERUP. 

The NVAX Chip microcode then does the following: 

1. Hardware interrupt requests are cleared. 

2. ICCS<6> is set to 0. 

3. SISR<15:1> is set to 0. 

4. ASTLVL is set to 4. 
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5. The Mbox PAMODE IPR is set to 30-bit physical address mode. 

6. CPUID is set to 0. 

7. The BPCR branch history algorithm is reset to the default value. 

8. Backup PC is retrieved from the Ibox and saved in SAVPC. 

9. PME is cleared. 

10. The current PSL, halt code, and value of MAPEN are saved in SAVPSL. 

11. MAPEN is cleared (memory management is disabled). 

12. All state nags are cleared. 

13. PSL is loaded with 041F0000. 

14. PC is loaded with E0040000 (the address of the start of the console code). 

16.3 Console initialization 

The console macrocode has the job of filling the gap between the initialized state described above 
and the initial state needed for the operating system. To that end, the console code does the 
following: 

1. Set CPUED to the correct value from the system environment. 

2. Set ECR (Ebox Control Register) as follows: 

1. Set FBOX_ENABLE to enable the Fbox. 

2. Set S3_TIMEOUT_EXT as required by the system environment. 

3. Set FBOX_ST4_BYPASS_ENABLE to enable Fbox stage 4 bypass. 

4. Write one to S3_STALL_TIMEOUT to clear any error. 

5. Set ICCS_EXT as required by the system environment. 

3. Set I^R (Ibox Control Status Register) as follows: 

1. Clear ENABLE to leave the VIC disabled. 

2. Write one to LOCK to clear any error. 

4. Set the PAMODE register MODE bit as required by the system. 

5. Write one to clear the LOCK bit in TBSTS (Translation Buffer Status). 

6. Initialize the PCSTS (Pcache Status) Register: 

1. Write one to clear the LOCK bit. 

2. Write one to clear PTE_ER_WR. 

3. Write one to clear PTE_ER. 

7. Set CCTL (Cbox Control) as follows: 

1. Clear ENABLE to leave the Bcache disabled. 

2. Set TAG_SPEED, DATA_SPEED, and SIZE to reflect the Bcache RAM configuration in 
the system. 

3. Clear FORCE JET. 

4. Clear DISABLEJERRORS. 

5. Clear SW.ECC. 

6. Clear TIMEOUT_TEST. 

7. Clear DISABLE_PACK to allow the write packing feature. 
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8. Clear SW_ETM. 

9. Write one to clear HW_ETM. 

8. Clear the various Cbox error registers: 

1. BCETSTS (Bcache Error Tag Status): Write one to LOCK, CORR, UNCORR, BAD_ADDR, 
and LOST_ERR to clear any errors. 

2. BCEDSTS (Bcache Error Data Status): Write one to LOCK, CORR, UNCORR, BAD. 
ADDR, and LOST_ERR to clear any errors. 

3. CEFSTS (Cbox Error Fill Status): Write one to RDLK, LOCK, TIMEOUT, RDE, and 
LOST_ERR to clear any errors. 

4. NESTS (NDAL Error Status): Write one to NOACK, BADWDATA, LOST_OERR, PERR, 
INCON.PERR, and LOST_PERR to clear any errors. 



16.4 Cache initialization 

Either the console code or the operating system will do the following final initialization steps 
(code examples are given): 

1. Initialize the VIC 



This code initializes the VIC by writing all 128 tags with 
good parity and all valid bits clear. 



movl 
movl 
movl 
movl 
movl 
movl 



vic_l oop: 



mtpr 

mtpr 

addl2 

cmpl 

bneq 



#"x00000020, rO 
#0, rl 
#0, r2 

fx00000800, r3 

#PR19$_VMAR, r4 

#PR19$_VTAG, r5 

r2, r4 
rl, r5 
rO, r2 
r3, r2 
vic_loop 



tag index increment = 1 hexaword block 

tag init value 

VIC tag starting address 

VIC tag ending address + 1 block 

VIC memory address register {VMAR) 

VIC tag register (VTAG) 



write current index to VMAR 
write the tag via VTAG 
increment index by the block size 
check if done 



2. Enable the VIC 



mtpr #<icsr$m_enable+icsr$m_lock>, #PR19$_ICSR 

3. Initialize the Bcache tags 

This code initializes the Bcache by writing all tags with good 
ECC and all valid and owned bits clear. This example initializes 
a 512Kb Bcache. This code can be changed to init the other legal 
Bcache sizes by changing the value in R3. SW_ECC in CCTL is clear, 
so the CBOX will generate correct ECC for the tag/valid/owned bits. 



movl 
movl 
movl 
movl 



#~x00000020, rO 
#0, rl 

#"x01000000, r2 

# A x01080000, r3 



tag index increment - 1 hexaword block 

tag init value 

Bcache tag starting address 

Bcache tag ending address + 1 block 

for 512Kb Bcache 



bcache_loop: 
mtpr 
addl2 
cmpl 
bneq 



rl, r2 
rO, r2 
r3, r2 
bcache_loop 



write tag to current tag address 
increment index by the block size 
check if done 
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4. Initialize the Bcache data 

.SBTTL ZERO_BCACHE_DATA 

;++ 

; ZERO_BCACHE_DATA - Write zero data and good ECC to the BCACHE data rams 

BYTES_PER_QUADWORD - 8 
BYTES_PER_PAGE - 512 

QUADWORD £_PER_PAGE - B YTEE_P ER_P AGE / BY TE £_P ER_QUADWORD 

2ER0_BCACHE_DATA : 
PUSHR #~M<R0,Rl,R2,R3,R4,R5 f R6> ; Save registers 
MFPR #PR$_CPUID r R5 ; XMI node id 

MOVL SYSL$L_BACKUP_CACHE_CONSTANT [R5] , Rl ; Formative cache constant 
MTPR Rl, #PR13$_CCTL ; Set cache with default constant 

EXTZV #PR13_CCTL$V_SIZE,#PR13_CCTL$S_SIZE f Rl,R2 ; Extract backup cache size 
MOVL SYSL$L_BCACHE_PAGE_C0NSTANT[R2] , R5 ; Cache page count 
CLRL R6 ; "AOB" index" 

CLRQ Rl ; Quadword data to be written to BCACHE rams 
10$: 

MULL3 #BYTES_PER_PAGE,R6,R3 ; BCACHE page index to write 
BSBW MAP_PHYS1CAL_ADDRESS ; Map R3 PA to R4 VA 
CLRL R3 ; "AOB" index 
20$: 

JSB @IO_WRITE_BCACHE_DATA ; Write BCACHE data 

ADDL2 #BYTES_PER_QUADWORD,R4 ; Update VA 

AOBLSS #QUADWORDS_PER_PAGE,R3 r 20$ ; Loop 'til done 

AOBLSS R5,R6,10$ ; Loop 'til done 

MFPR #PR$_CPUID, R5 ; XMI node id 

MOVL SYSL$L_BACKUP_CACHE_CONSTANT[R5] ,R1 ; Formative cache constant 
MTPR Rl, #PR13$_CCTL ; Set cache with default constant 
POPR #' V M<R0,R1,R2,R3,R4,R5,R6> ; Restore registers 
RSB ; Return 



•SBTTL MAP_PHYSICAL_ADDRESS 

;++ 

; MAP_PHYSICAL_ADDRESS - Map a physical address with a system VA 
; INPUTS: 

; R3 - Physical address to map to system VA 
; OUTPUTS: 

; R4 - System VA of physical address in R3 

MAP_PHYSICAL_ADDRESS : 
PUSHR # A M<R0,R1,R9> ; Save registers 
BSBW GET_XNP_NUMBER ; CPU number to R9 
MOVAL @SYSLOA_SPTE[R9] ,R0 ; Address of this CPU's SPTE 
BICL2 #PTE$M_VALID, (RO) ; Invalidate SPTE 

INVALIDATE_TB ENVIRON—UNMAPPED ; Invalidate this TB 

MOVL (R0),R1 ; SPTE 

EXTZV #VA$V_VPG, #PTE$S_PFN,R3,R4 ; Address PFN 

INSV R4, #PTE$V_PFN,#PTE$S_PFN,R1 ; Insert the PFN 

BISL3 #<PTE$M_VALID!PTE$M_M0DIFY!PTE$C_KW>,R1, (RO) ; Map PFN 

EXTZV #VA$V_BYTE,#VA$S_BYTE,R3,R0 ; Address byte offset 

MULL3 #512,R9,R1 ; This CPU's page offset 

MOVAB eSYSLOA_SPTE_VA[Rl] ,R4 ; VA that this CPU's SPTE maps 
INSV RO, #VA$V_BYTE,#VA$S_BYTE,R4 ; A VA that maps physical address 
POPR # /v M<R0,Rl,R9> ; Restore registers 
RSB ; Return 
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;++ 

; INPUTS: 

; Rl » L© longword data to be written to BCACHE 

; R2 - Hi longword data to be written to BCACHE 

; R4 « Virtual address that maps physical address corresponding 

; to secondary cache index to be written. 

; OUTPUTS : 

; RO LBS indicates BCACHE data written, otherwise clear 

******************************************************************************** 

; This routine cannot be stepped through using XDELTA. The FORCEHIT bit 
; in the backup cache control is set and will cause erroneous hits to 
; occur in the secondary cache. 

• ****************************************************************** 
.ALIGN LONG 

IO_WRITE_BCACHE_DATA_ROUTINE : 
MOVL R3, IO_SAVED_REGISTER ; Save register 
MTPR #0, #PRS_TBIA ; Reset TB allocation pointer 
CLRL RO ; Signal failure 
TSTL 10$ ; Ensure TB hit 
TSTL 30$ ; Ensure TB hit 
TSTL (R4) ; Ensure TB hit 
TSTL B A 4(R4) ; Ensure TB hit 
MOVAB 10$, R3 ; Address to check 
MTPR R3, #PR$_TBCHK ; In TB 
BVC 20$ ; If VC no 
MOVAB 30$, R3 ; Address to check 
MTPR R3, #PR$_TBCHK ; In TB 
BVC 20$ ; If VC no 
MOVAL (R4),R3 ; Address to check 
MTPR R3, #PR$_TBCHK ; In TB 
BVC 20$ ; If VC no 

MOVAL B A 4(R4),R3 ; Address to check 
MTPR R3, #PR$_TBCHK ; In TB 
BVC 20$ ; If VC no 
10$: 

MFPR #PR13$_CCTL r R3 ; Read CCTL 
BICL2 #<- ; Form a mask 

<l@PR13_CCTL$V_FORCE_HIT>!- ; Force hit mode 

<l@PR13_CCTL$V_DISABLE_ERRORS>!- ; Disable errors 

<0>- 

>,R3 ; Local copy control register 
BISL3 #<- ; Form a mask 
<1@PR13_CCTL$V_ENABLE>!- ; Enable BCACHE 
<18PR13_CCTL$V_FORCE_HIT>!- ; Force hit mode 
<l@PR13_CCTL$V_DISABLE_ERRORS><- ; Disable errors 
<0>- 

>,R3,R0 ; Local copy control register 
MTPR R0, #PR13$_CCTL ; Enable Bcache - FORCE HIT, DISABLE ERRORS 
MFPR #PR13$_CCTL,R0 ; Allow the dust to settle... 
MOVQ Rl, (R4) ; Write BCACHE data 
MTPR R3, #PR13$_CCTL ; BCACHE off 

MFPR #PR13$_CCTL,R3 ; Allow the dust to settle... 
MOVL #SS$_NORMAL,R0 ; Signal success 
20$: 

MOVL IO_SAVED_REGISTER, R3 ; Restore register 
RSB ; Return 
30$: 

5. Initialize the Pcache 



This code initializes the Pcache by writing all 25 6 tags with 
good parity and all valid bits clear. 

movl #' v x00000020, rO ; tag index increment - 1 hexaword block 

movl #0, rl ; tag init value 

movl #"x01800000, r2 ; Pcache tag starting address 

movl # A x01802000, r3 ; Pcache tag ending address + 1 block 
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pcache_loop: 

mtpr rl, r2 ; write tag to current tag address 

addl2 rO, r2 ; increment index by the block size 

cmpl r3, r2 ; check if done 

bneq pcache_loop ; 

6. Enable the Bcache and the Pcache 

NVAX cache coherency requires that the Pcache is always a subset of the Bcache. This code 
to enable the caches is arranged to insure that this is true. Thus, the Bcache is enabled first, 
and an REI is executed between the Bcache enable and the Pcache enable. The purpose of 
the REI is to synchronize data prefetching such that the Pcache will not perform any fills to 
addresses that were not also filled in the Bcache. 



mfpr #PR19$_CCTL, r6 

bisl2 #<cctl$m_enable>, r6 

mtpr r6, #PR19$_CCT1 

movpsl - (sp) 

moval init_cont, - (sp) 

rei 



init_cont: 

mtpr 



; get current value in Cbox CTL IPR 

; set the Bcache enable bit 

; write the new Cbox CTL IPR 

; push the psl 

; and the next PC 

; branch to the next PC 

; flushing the VIC 

; and aborting all 

; previous I READS 

; Now that state is synchronized, enable 

; the Pcache 



#<pcctl$m_d_enable+pcctl$m_i_enable+pcctl$m_p_enable>, #PR19$_PCCTL 



16.5 Miscellaneous Information 

There is no need to explictly initialize the Translation Buffer as the NVAX microcode performs 
an internal TBIA on any MTPR to the MAPEN IPR. 

There is no need to explictly initialize the data portions of the VIC or Pcache as long as the tags 
are initialized with all valid bits clear. Both Bcache tags and Bcache data must be initialized 
before the cache is enabled. 
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16.6 Revision History 



Table 1 6-1 : Revision History 


Who 


When 


Description of change 


Debra Bernstein 


9-May-1990 


Initial edit 


Debra Bernstein 


19-Nov-1990 


Add Miscellaneous Information section. Add true code 
examples for cache init. Add information on the 
ordering of cache enable. 


Debra Bernstein 


ll-Mar-1991 


Update to pests, tbsts 


Rebecca Stamm 


9-Oct-1991 


Bcache data must be initialized as well as the Bcache 
tags. 
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Chapter 17 
Chip Clocking 



17.1 Overview of the NVAX Clocking System 

The NVAX CPU generates all the clock signals required to operate the CPU and the NDAL 
interface. The clocks are derived from a high frequency oscillator signal that is supplied to the 
chip. lb allow for flexible logic design the chip implements a four phase clocking system. The 
four internal NVAX clock phases are generated on-chip by dividing the frequency of the external 
oscillator by four. 

The NVAX chip generates and drives the NDAL clocks which are used to clock the peripheral 
chips on the NDAL bus. The NDAL also uses a four phase clock scheme, but runs three times 
slower than the internal NVAX clocks. 

17.2 Receiving the NVAX Externa! Oscillator Signal 

The NVAX chip can receive the external clock from one of two sources depending on the state of the 
OSC_TEST_H pin. When OSC_TEST_H is asserted the clock is received by the OSC_TCl_H 
and OSC_TC2_H pins. These pins are configured to use standard 3V CMOS signal levels. When 
OSCJTESTJS is deasserted the clock is received by the OSC_H and OSC.L pins. These pins 
use a differential amplifier circuit to receive the clock signal from an ECL oscillator. Figure 17—1 
shows the NVAX clock interface circuitry. 

EXTERNAL OSCILLATOR 

Detailed information concerning the design of the external oscillator can be found in 
the NVAX Signal Integrity Specification. 

17.2.1 The System Environment 

During normal system operation OSC_TEST_H is tied low and the OSCJS and OSCJL pins are 
used to receive the external clock source. The NVAX CPU is designed to operate at a maximum 
internal clock speed of 100 MHz. This requires the external oscillator to deliver a 400 MHz 
clock. At these frequencies the generation and interconnection of signals is extremely complex 
and specialized circuitry must be used. 
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Hie NVAX oscillator generates a pair of clock signals that are 180 degrees out of phase. The 
oscillator does not supply standard CMOS logic levels. The signals have a peak to peak voltage 
swing of .5 volts centered at 3.5 volts - therefore, a standard CMOS input buffer cannot be used 
on the chip to receive the signals. Instead, a differential amplifier is used and the signals are AC 
coupled and level shifted before they are received by the amplifier. 

17.2.2 The Chip Test Environment 

The chip tester, used during chip manufacturing to functionally verify the part, cannot supply 
a 400 MHz clock. The two pins OSC_TCl_H and OSC_TC2_H offer an alternative method for 
supplying the chip with clocks. The pin OSC_TEST_H is used to select between the system and 
test clocking modes. When the pin is asserted the test clock pins supply the clock to the chip. 
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The test clock pins are supplied with two clock signals that are 90 degrees out of phase. They 
are XORed on-chip to generate the internal 2X clock signal K_PAD_CK1%ZZ. Figure 17-2 shows 
the relationship between K_PAD_CK1 %ZZ and the test clock input signals. 

Figure 17-2: On-Chip XOR Test Functionality Waveforms 
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In addition to the frequency doubling feature of the test clock input circuitry, the pins use CMOS 
differential amplifiers to receive the clock signals. Hence, the test oscillator clock inputs can be 
used to drive the chip at slower than maximum speeds using standard 3 volt CMOS logic levels. 



17.3 On-Chip Clocks 

1 7.3.1 Clock Generation/Distribution Overview 

Figure 17-3 illustrates the overall structure of the clock generation/distribution system. The 
clocks are distributed across the chip in two stages. The global clock generator receives the 
master clock signal and generates the following global clocks that are driven to various sections 
of the nvax chip: 

• eight single phase matched clocks (true and complement) K%PHLli4_H & L 

• four double phase matched clocks K%PHI_l2s4l_H 

• four NDAL matched clocks K_GLB%PHU2^l_OUT_H 

• two specially tuned single phase clocks K W P%PHI_3E and KJFfcPEIJZE 

For purposes of denning clock specifications in different parts of the NVAX chip, clock section (or 
simply, section) is denned for the remainder of the chapter to be one of the chip sections shown 
in Table 17-1. 



Table 1 7-1 : NVAX CPU Clock Sections 



Section Name Symbol 

Cbox MCB 

Ebox IEB 

Fbox F 

Ibox I 

Mbox M 

Pcache P 
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Table 17-1 (Cont.): 


NVAX CPU Clock Sections 


Section Name 


Symbol 


VIC 


V 


Upper I/O Pad Logic 


PAD 


Lower I/O Pad Logic 


PADL 


Global Reset Logic 


E 



Figure 17-3: On-Chip Clock Distribution 
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The global clock signals are received and driven into each section by local clock buffers. It is 
these local clocks that are used to control logic sequencing throughout the chip. Note that the 
active high single phases are used by all sections, while the double phases and active low single 
phases are used only in the Fbox. NDAL clocks are driven to the pads where they are buffered 
and driven off chip. 



1 where X is a clock section symbol. 
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17.3.2 Global Clock Distribution 

In this two stage distribution scheme, clock generation and distribution are very tightly controlled 
at the global level. Delays seen by each section are minimized and equalized to reduce global 
skew. Global clock signals have matched buffer delays from the generator, matched interconnect, 
and matched section loads. 

Load matching on global clock signals is implemented using dummy loads - MOSFET capacitors 
added to global distribution lines to balance section driver loads seen by the global clock generator. 
Dummy loads are added to global clock signals at each section input, matching the section load 
on that signal with the most heavily loaded clock signal at that section input. Global routing of 
the clock signals is carefully controlled to both minimize RC delays and to match the delays of 
all signals arriving at a common receiver. 

7b provide flexibility in global clock distribution, global clock signals are organized into four 
groups. Interconnect and loads are matched between the signals within each group. The four 
groups are designed to have very similar edge rates and delay characteristics. These groups 
consist of: 

1. K%PHX_i:4_H - active high CPU clocks 

2. K%PHI_i:4_L and K%PHI_i24l_H - active low and double phase CPU clocks 

3. K_GLB%PHIi2:4i_OUT_H - double phase NDAL clocks 

4. K_PTcPHI_3E_H and K_V%PHI_3E_H - special CPU clocks 

1 7.3.3 Section Clock Distribution 

Section clock distribution rules are more flexible than global rules to allow for stringent routing 
requirements at the section level. Primary requirements for section-level distribution are 1) 
maximum 125 pS RC delay between section drivers and any receiver, and 2) adherence to NVAX 
methodology which specifies the use of only fully complementary receivers. A detailed description 
of the rules relating to the use of the NVAX on-chip clocks can be found in the NVAX CPU Chip 
Design Methodology document. 

17.3.4 Global Clock Waveforms 

Eight single phase and four double phase clock signals are globally distributed on the chip. Four 
NDAL clocks are driven to the pads where they are buffered and driven off chip. The single 
and double phase CPU clocks have a period of one NVAX cycle. The NDAL clock cycle is three 
NVAX cycles in length. Both rising and falling clock transitions occur at the boundaries of each of 
the four phases of an NVAX clock cycle. Waveforms for the globally-distributed clock signals are 
Ishown in Figure 17-4. The use of these global clock signals is RESTRICTED to interconnecting 
the section clock drivers. 

Clock signals K_P%PHI_3E_L and K_V%PHI_3E_L are used for sense amplifier timing within 
the Pcache and VIC, respectively. These signals are "early" versions of K%PHI_3_H and are 
carefully tuned in relation to other clock signals. For this reason, waveforms for these clocks are 
not depicted in Figure 17-4. These signals are discussed further in the next section. 
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17.3.5 Section Clock Waveforms 

The section clocks are buffered versions of the globally distributed clock signals. Ten sections 
on the chip receive clocks K%PHI_i:4_H, while the Fbox is the only receiver of K%PHI_1j4_L 
and K%PHL12j41_H. NDAL clocks are received only at the pads and are driven off chip as 
PHI12:41_OUT_H. 

Clock signals K_P%PHI_3E_L and K_V%PH1 W 3E_L are received only in the Pcache and VIC sections, 
respectively. These clocks are used to trigger sense amplifiers and must be tuned such that their 
buffered, section level edges precede the normal section level phase 3 edges (e.g. KJP%PHI,_3J5 
and K_V%PHI_3_H) by approximately 12, nS. 

All section buffers have identical internal delays. To insure this, standard clock drivers are used 
in each section (except the Fbox). The standard clock driver is designed to be used in a distributed 
fashion: multiple identical parallel drivers are used, with inputs, outputs, and primary internal • 
nodes being individually strapped together within each section. 
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17.3.6 Clock Skews and Rise/Fall Times of the Section Clocks 

Because of the tightly controlled delays in the first stage of the distribution network, clock skew 
specifications are the same in most sections of the chip. The only exception to this is in the Fbox, 
where leverage of the layout from the Rigel Fbox necessitates specification of a lower-tolerance 
skew. This higher skew figure is due to larger allowable RC delays in the Fbox section level clock 
distribution network. 

Table 17-2 specifies the skews and rise/fall times for the edges of the single phase clock signals. 
These values are for a TT part running at 100°C and 3.0 volts. Clock Skew is the uncertainty in 
time from when any clock edge crosses the 50% Vdd point to when any other clock edge crosses 
the 50% Vdd point. The rise and fall times are measured from the 10% to 90% points of the full 
voltage transition of the clock signal. Adjacent clock phases can overlap or underlap due to clock 
skew. 



Table 17-2: Skews and Rise/Fall Times 1 







Skew Between 


Skew Between 




Skew Within 


Skew Within 


Any Two 


Fbox and Any 




Any Section 2 


Fbox 


Sections 2 


Section 


Rise/Fall Times 


0.5 nS 


1.0 nS 


0.5 nS 


1.0 nS 


0.5 nS 



1 2 

1 7.4 The NDAL interface timing system 

17.4.1 NDAL Clocks 

The NVAX CPU provides four double phase low skew clocks that are used by the memory interface 
to communicate with the CPU via the NDAL. The NDAL runs at one third the speed of the internal 
CPU cycle. The NDAL clocks are generated by dividing the internal clock frequency by three. 
The interconnect used for these signals must be well controlled to avoid excessive delay, ringing, 
and skew. 

The relationship of the four clocks to the internal CPU clock cycle is shown in Figure 17-5. The 
timing diagram also indicates the timing of the NDAL signals. The NDAL changes in$i2> is valid 
during $3, and goes tristate in $4. All NDAL signal transitions are referenced to the RISING 
transitions of the clocks. 



1 These skews are not valid for the NDAL clocks. See Section 1.4 for specific NDAL clock skew information. 

2 Excluding the Fbox. 
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17.4.2 Controlling Inter-Chip Clock Skew 

The distribution of the NDAL clocks across the module is critical to the performance and 
functionality of the CPU. At the specified operating frequencies of the CPU, the module 
interconnect acts as a transmission line. It has a characteristic impedance and delay. The 
interconnect used for the clock signals must be carefully matched to avoid skew. Note that skews 
and signal delays are measured from the point where the waveform reaches VDD/2 (nominally 
1.65V). 

MODULE INTERCONNECT 

Detailed information concerning the design of the module interconnectivity can be 
found in the NVAX Module Signal Integrity Handbook. 



Figure 17-5: Relationship of Internal and NDAL Clock Cycles 
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17.4.2.1 Self Skew 

Each NDAL clock is distributed to a number of receivers on the CPU board. In a perfect electrical 
environment each chip would receive the clocks at exactly the same time. Unfortunately, due to 
mismatched interconnect lengths and variations in the electrical properties of the interconnect, 
a clock signal will not arrive at the different receivers at the same time. For example, refer to 
Figure 17—6. The clock signal is driven from the NVAX CPU to four clock receivers. Due to 
interconnect length mismatches it will be received at points A, B, C, and D at different times. 
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Figure 17-6: Self Skew 
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The maximum difference in the arrival time of a particular clock transition at different locations 
is denned as the self-skew of the clock. Self-skew is the maximum possible difference between 
the actual clock transition and the specified clock transition. For the NVAX CPU to operate at 
its maximum performance the following rules must be obeyed. 

1. The rising transition of each NDAL clock occurs at any receiver within 1.0ns of when it occurs 
at any other receiver. For example, refer to the diagram above. The #12 rising transition 
occurs at point A, point B, point C and point D, and the transitions at each separate point 
occur within 1.0ns of the transitions at every other point. 

2. Rule 1 must also hold for NDAL falling edge transitions. 

These rules imply that if a clock transition appears at one receiver 0.5ns before the specified time, 
the same clock transition cannot appear at another receiver more than 0.5ns after the specified 
time: this would violate rule 1. 

17.4.2.2 Inter-Clock Skew 

At the clock receivers, each NDAL clock transition is specified to appear at some time relative 
to any of the other NDAL clock transitions. In an ideal design, all clock transitions would occur 
at the specified time. Unfortunately, due to device, processing, and interconnect mismatches, the 
clock signals will arrive at times different from those specified. The uncertainty in arrival times 
is defined as the inter-clock skew. For the NVAX CPU to operate at its maximum performance 
the following inter-clock skew rules must be obeyed. 

1. The skew between any two rising NDAL clock transitions at any two receivers is +/-0.5ns. 
For example, if the transitions are defined to be 15ns apart, the clock design guarantees that 
they are between 14.5 and 15.5 ns apart. 

2. Skew between falling clock transitions is +/-0.5ns. 

3. The skew between a rising transition and a falling transition is +/-0.75ns. 

17.4.3 Driving and Receiving NDAL signals 

Detailed information regarding NDAL clocking and NDAL skew considerations can be found in 
Chapter 3. 



DIGITAL CONFIDENTIAL 



Chip Clocking 17-9 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



17.4.4 information Transfer between the NDAL clock system and the on-chip 
clock system 

Detailed information regarding information transfer between the NDAL clock system and the 
on-chip clock system can be found in Chapter 13. 

17.5 Initializing the NVAX system. 

ASYNCJRESETJL is an asynchronous input to the NVAX chip. It is used to force the NVAX 
CPU into a known state. The assertion of ASYNC_RESET_L occurs during NVAX system 
initialization. ASYNCJRESETJL must be asserted for a minim irm of 7 NDAL cycles. 

SYSJRESETJL is both an asynchronous and synchronous output. SYS_BESET_L is 
asynchronously asserted whenever ASYNC.KESET._L is asserted. When asserted, it places 
the NVAX system chips in their initial power-up states. 

SYS_RESET_L is asserted for a minimum of 7 NDAL cycles. The deassertion of the signal is 
synchronized to the NDAL clocks. It is deasserted on the rising edge of PHI12_OUT_H and is 
valid at the NDAL receivers in time to be latched in NDAL #4. Figure 17—7 shows the relationship 
between ASYNCJRESETJL and SYS_RESET_L signals. 

Figure 17-7: System Reset Timing 
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17.5.1 Internal NVAX Reset 

The ASYNCJRESETJL pin is used to generate several internal reset signals which reset 
various parts of the NVAX chip. ASYNCJRESETJL. is synchronized with NDAL £ 3 , 
then latched after settling with NDAL $1. This synchronized signal is piped to NVAX 
$ 4 to produce K_PAD%SYNC_RESET. The internal, buffered version of ASYNCJRESETJL, is 
KJPAD%ASYNCJRESET. 
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To satisfy various logic timing constraints, several reset signals are produced and distributed 
throughout the NVAX chip. The primary internal NVAX reset is K%RESET. This signal 
is asserted asynchronously and deasserted synchronously following assertion/deassertion of 
ASYNC_RESET_L or DISABLE_OUT_L, or during the BSR External Test. Buffered versions 
of K%RESET are used by the Ebox, Ibox, VIC, and Fbox to reset local logic. Detailed information 
regarding the functions of DISABLE_OUT_L and the BSR External Test can be found in the 
NVAX Testability Specification. 

The Mbox, Pcache, and Cbox (excluding BIU) receive buffered versions of K_MC%RESET. This 
signal functions the same as E%RESET, except it is also asserted following an Ebox S3 timeout 
(see individual box chapters of this specification for detailed information). 

The I/O Pad logic receives buffered versions of K%EXT_RESET. This signal is the same as K%BESET, 
except it is not asserted with DISABLE_OUT_L or during BSR External lest as E%RESET is. 

E_CESeRESET is asserted during NDAL #3 and piped to NVAX internal <Pj. A buffered version of 
this is used to reset BIU logic in the Cbox to the proper NDAL sequencing state during the reset 
sequence (see CBOX chapter for detailed information). 
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17.5.2 Generation of Clocks During Power-up 

The NVAX chip generates its internal clocks and the NDAL clocks by dividing down a high 
frequency external oscillator signal. The external system oscillator is powered from the module 
5 volt power supply. Its clock signals must be valid before 3 volt power is supplied to the NVAX 
chip. The oscillator takes a maximum of 10 mS of initialization time before its clocks can be 
considered free running. Hence, the module power supply must be designed to guarantee that 
the 3 volt supply is not valid until 10 mS after the 5 volt supply is stable. 

The NVAX clock generator derives free running clocks from the external oscillator clock. The 
clock generator is self-initializing and is not affected by the assertion of ASYNC_KESET_L, 
except for clock generator reset test features (see CLOCK GENERATOR RESET, next section). 
The clock generator requires a maximum of 3 oscillator clock cycles to initialize itself after the 
3 volt module power supply has become valid. Figure 17-8 shows the NVAX Chip and external 
oscillator power-up sequence. 

Figure 17-8: Clock State During Initial Power-up 



/////- 



CSC_r2ST_H XXXXXXXXXXXXX3CXXX5CXXXX 

<-0SC indetersiinate-> 
csc :-: xxxxmrx23Cxxxxxxxxxxxxxx/~\ /~\ 



_ -\_/~\_/~w~\_ 

<3 OSC Cycles maz.x- | 
OSC_L XXXXXXXXXXXXXXXXXX30CXXXXW-W ... _/~W~\_/~W~W~\_/~\_/~\_/~W~\_/~W- 

<3 OSC Cycles maz.>+ | 
K%PKI_12_H (Internal Chip Clock) XXXXXXXXXXXXXXXXX \ / \ /- 



K%PHI_23_H (Internal Chip Clock) XXXXXXXXXXXXXXXXX_ 

+ 

K%PHI_34_H (Internal Chip Clock) XXXXXXXXXXXXXXXXX_ 



/• 



K%PHI_41_H (Internal Chip Clock) XXXXXXXXXXXXXXXXX \_ 



PHI_12_OtJT (NDAL System Clock) 

PHI_23_OUT (NDAL System Clock) 

PHI_34_OUT (NDAL System Clock) 

PHI_41_OUT (NDAL System Clock) 



xxxxxxxxxxxxxxxxx/- 
+ 

xxxxxxxxxxxxxxxxx 



xxxxxxxxxxxxxxxxx 



xxxxxxxxxxxxxxxxx- 



\ /-\ /-\ /-\ /~\ /-\ /-\ 



\ 



/- 



/-\_/-\_ 

I + 

/ — 



******************************************************************* 



17.5.3 Clock Generator Reset 

The NVAX chip incorporates a clock generator reset feature for use in verifying chip timing. The 
generator can be reset to a known cycle and phase in order to verify various signals against their 
specified timing. 
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WARNING 

Use of the dock generator reset feature must follow these specific sequencing and 
timing constraints. Deviation from these specifications will have undesirable results, 
and can result in physical damage to the NVAX chip. Contact a member of the NVAX 
clock design team for further information about this feature. 

Figure 17—9 shows the proper signal timing for effecting a reset of the clock generator, lb begin 
the clock generator reset sequence, the chip is powered up using normal high speed oscillator 
inputs supplied through OSC_H and OSC_L. This is the normal powerup mode, and allows the 
internals of the chip to reach a deterministic operating state. 

Following a normal powerup reset sequence, the oscillator input is turned off briefly (1-2 mS) 
to switch the oscillator input to the test clocks. Following the switch to the test clocks, the 
chip is again reset to restore any internal state lost during the test clock switch. Note that 
ASYNC_RESET_L is held asserted through the duration of the clock generator reset sequence. 

Following this second chip reset sequence, the test clocks are stopped briefly (500 nS MAX). The 
states of test clocks OSCJTCl_H and OSC_TC2_H when stopped must be the same, either both 
high or both low (as shown). TE ST_D ATA_H should be driven low as shown in Figure 17-9 to 
effect the clock generator reset. This immediately places the clock generator into NVAX $o an ^ 
NDAL TEST_DATA_H is then driven high and clocking of the chip is resumed. On the first 
oscillator cycle following resumption of clocking, the generator will transition into NVAX #3 and 
begin normal sequencing. AYSYNC JRESETJL must remain asserted for at least 7 NDAL cycles 
following resumption of clocking. 
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Figure 17-9: Clock Generator Reset Timing 



CPU Phase 
NDA1 Phase 

0SC_K 

0SC_L 

0SC_TC1_H 

OSC TC2 H 

INTERNAL OSC 

ASYNC_?.SSE-_ 



X>CXXXXXXXX}C£>CXXXX>^^ 2 
XXXXXXXXXXXXXXXXXXXXXXXXXXXX^^ 1 



\ /~\ /-\ /-\ /- 



3 4 1 2 3 4 1 

I 2 3 



/ \ / \ / \ / \ 



/ \ / \ / \ / \ 



/ \ / \ / \ /- 

/ \ / \ / \ , 



/~\ /-\ /~\ /~\ 



/-\ /-\ /-\ /-\ /~\ /-\ /-\ /-\ 



■/////• 



/-\ /~\ /-\ /~\ /~\ /-\ 



- r SBC%OSCl_E is the internal raster clerk produced from either the OSCJI ar.d 
OSCJL. inputs, rr the OSC_5Cl_B ar.d OSCJSCZJS inputs. OSCjneST_B ~~ 

is used tc select -he deck source as described ir. -his clock specification. 

■ £ indicates a static (non-changing) KDiX 0 i . 
riming Notes: 

1 . ECL pin inputs OSC_H and OSCJL must be used to supply clocks to chip prior 

to and during power-up. Inputs 0SC_TC1_K and OSC_TC2_H must be held low in order to 
prevent latch -up. 

2. Switch to test clocks 0SC_TC1_H and 0SC_TC2_H. Start measure out lpat on chip teste: 

3. Clocks restarted to restore internal chip signals prior to clock-reset sequence. 

4. ASTWC_RE3ET_L must remain asserted for a minimum of 7 NDAL cycles 
following restart of clocks. 
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17.6 NVAX Clock Section Signal/Pin Dictionary 
17.6.1 Schematic - Behavioral Translation 



Schematic Name 1 


Behavioral Model Name' 


- Signals 




K%EXT_RESET 


K%EXT_RESET 


K%PHI_1_H 


K%PHI_1_H 


K%PHI_2_H 


K%PHI_2_H 


K%PHI_3_H 


K%PHI_3_H 


K%PHI_4_H 


K%PHI_4_H 


K%PHI_1_L 


K%PHI.1_L 


K%PHI_2_L 


K%PHI_2_L 


K%PHI_3_L 


K%PHI_3_L 


K%PHI_4_L 


K%PHI_4_L 


K%PHI_12_H 


K%PHI_12_H 


K%PHI_23_H 


K%PHI_23_H 


K%PHI_34_H 


K%PHI_34_H 


K%PHI_41_H 


K%PHI_41_H 


K%RESET 


K%RESET 


K_CE%RESET 


K_CE%RESET 


K_GLB%PHI12_OUT_H 


K%NDAL_PHI_12_H 


K_GLB%PHI23_OUT_H 


K%NDAL_PHI_23_H 


K_GLB%PHI34_OUT_H 


K%NDAL_PHI_34_H 


K_GLB%PHI41_OUT_H 


K%NDAL_PHI_41_H 


K_MC%RESET 


K_MC%RESET 


K_P%PHI_3E 


non-existent 2 


K_PAD%ASYNC_RESET 


K_PAD%ASYNC_RESET 


K_PAD%SYNC_RESET 


KLPAD%SYNC_RESET 


K_SEC%OSCl_H 


module call 3 


K_V%PHI_3E 


non-existent 2 


-Pins 




ASYNC_RESET_L 


P%ASYNC_RESET_L 


DISABLE.OUT.L 


P%DISABLE_OUT_L 


OSC_TEST_H 


P%OSC_TEST_H 



1 Signal8 without specified assertion levels may exist in _H and/or _L versions. 

2 These signals are not modeled in the behavioral code. 

3 Any transition is represented in behavioral model by a call to routine n_%master_clock_transdtion. 
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Schematic Name' 


Behavioral TVf r»flf»l Name' 


ncr tt 




nop t 


Tvy AOP T 


nor tpi xt 


DC.PiOP TV1 TT 


nop TOO XT 


Dff.PiOP TV"»0 TT 


PHI12_0UT_H 


P%Pffll2_OUT_H 


PHI23_OUT_H 


P%PHI23_OUT_H 


PHI34_OUT_H 


P%PHI34_OUT_H 


PHI41_0UT_H 


P%PHl41_OUT_H 


SYS_RESET_L 


P%SYS_RESET_L 


TEST_DATA_H 


P%TEST_DATA_H 



1 Signals without specified assertion levels may exist in _H and/or _L versions. 



1 7.6.2 Behavioral - Schematic Translation 



Behavioral Model Name 4 


Schematic Name 4 


- Signals 




K%EXT_RESET 


K%EXT_RESET 


K%PHI_1_H 


K%PHI_1_H 


K%PHI_2_H 


K%PHI_2_H 


K%PHI_3_H 


K%PHL3_H 


K%PHI_4_H 


K%PHI_4_H 


K%PHI_1_L 


K%PHI_1_L 


K%PHI_2_L 


K%PHI_2_L 


K%PHI_3_L 


K%PHI_3_L 


K%PHI_4_L 


K%PHI_4_L 


K%PHI_12_H 


K%PHI_12_H 


K%PHI_23_H 


K%PHI_23_H 


K%PHI_34_H 


K%PHI_34_H 


K%Pffl_41_H 


K%PHI_41_H 


K%RESET 


K%RESET 


K_CE%RESET 


K_CE%RESET 


K%NDAL_PHI_12_H 


K_GLB%PHI12_OUT_H 


K%NDAL_PHI_23_H 


K_GLB%PHI23_OUT_H 


K%NDAL_PHI_34_H 


K_GLB%PHI34_OUT_H 



4 Signals without specified assertion levels may exist in _H and/or _L versions. 
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Behavioral Model Name 4 


Schematic Name 4 


K%NDAL_PHI_41_H 


rLGLB%PHI41_OUT_H 


K_MC%RESET 


K_MC%RESET 


K_PAD%ASYNC_RESET 


K_PAD%ASYNC_RESET 


K_PAD%SYNC_RESET 


K_PAD%SYNC_RESET 


-Pins 




P%ASYNC_RESET_L 


ASYNC_RESET_L 


P%DISABLE_OUT_L 


DISABLE_OUT_L 


P%OSC_TEST_H 


OSC_TEST_H 


P%OSC_H 


OSC^H 


P%OSC_L 


OSC.L 


P%0SC_TC1_H 


0SC_TC1_H 


P%OSC_TC2_H 


OSC_TC2_H 


P<&PHI12 OUT H 


PHI12 OUT H 


P%PHI23_OUT_H 


PHI23_OUT_H 


P%PHI34_OUT_H 


PHI34_OUT_H 


P%PHI41_OUT_H 


Pffl41_OUT_H 


P%SYS_RESET_L 


SYS_RESET_L 


P%TESTJDATA_H 


TEST_DATA_H 



Signals without specified assertion, levels may exist in _H and/or _L versions. 



17.7 Revision History 



Table 17-3: Revision History 



Who 


When 


Description of change 


Bill Bowhill 


28-Jan-1990 


Initial Release 


Tim Fischer 


28-Jan-1991 


Pass 1 Updates Complete 



DIGITAL CONFIDENTIAL 



Chip Clocking 17-17 



Chapter 18 

Performance Monitoring Facility 



18.1 Overview 

The NVAX CPU chip contains a facility by which privileged software may obtain performance in- 
formation about the dynamic behavior of the CPU. The facility is implemented with a combination 
of hardware and microcode, and controlled by software using privileged instructions. 

Two 64-bit performance counters called PMCTRO and PMCTR1 are maintained in memory for 
each CPU in the system. The lower 16 bits of each counter are implemented in hardware in the 
CPU, and at specified points, microcode updates the quadwords in memory with the contents of 
the hardware counters. 

The performance monitoring facility may be configured by privileged software to count a number 
of events in the system, from which performance analysis data such as cache and TB hit rates, 
cycles-per-instruction, and stall frequencies may be calculated. 

18.2 Software interface to the Performance Monitoring Facility 

The performance monitoring facility makes use of a data structure in memory, and must be 
configured and enabled via a location in the System Control Block, processor register references, 
and the LDPCTX instruction. 

18.2.1 Memory Data Structure 

The two 64-bit performance counters for each CPU are maintained in a data structure in memory. 
This data structure consists of a pair of quadwords for every CPU in the system. The physical 
address of the base of the data structure is obtained from offset 58 (hex) in the System Control 
Block. The format of this location is shown in Figure 18-1. I 
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Figure 18-1: Performance Monitoring Data Structure Base Address 



31 30 29 26|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12111 10 09 08|07 06 05 04|03 02 01 00 
+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
I Physical Address of Performance Monitoring Data Structure |SB2 0 1 1| :SCB+58(h« 

+ h + + + 1- + + + + + + 1- h + + + + + + + + h + + + + H +--+ + + + 



NOTE 

A quadword-aligned physical base address is constructed by clearing the lower 3 bits 
of the longword fetched from offset 58 (hex) in the SCB. Microcode will not update the 
block in memory unless bits <2:0> of this longword contain 011 (binary). If these bits 
are found to contain another value, a machine check with code MCHK_PMF_CONFIG 
is performed to notify software that the performance monitoring facility was incorrectly 
configured. If is strongly suggested that the physical address be at least octaword 
aligned, and preferably page aligned. 

The address of the pair of quadwords for an individual CPU is computed by shifting the CPUID 
value left 4 bits and adding this value to the base address. This calculation is shown in equation 
form below (all numbers in these equations are hex). 

phys.base.addr = SCB [58] AND FFFFFFF0; 
phys .block. addr = { CPU ID LSHIFT 4 } + phys.base.addr; 



The format of the pair of quadwords for each CPU is shown in Figure 18-2. 



Figure 18-2: Per-CPU Performance Monitoring Data Structure 



31 30 29 28127 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04 | 03 02 01 00 

+ h + + + h + + + + + + + + + + h + + + + +--+ + + h + + + + + + + 



I PMCTRO, low longword I :+00 

+ — +-- + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — +--+ — + — + — + — + — + — + 
| PMCTRO, high longword I :+04 

+ — + — + — + — + — + — + — + — + — + — +--+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
63 62 61 60|59 58 57 56|55 54 53 52151 50 49 48|47 46 45 44|43 42 41 40|39 38 37 36|35 34 33 32 

31 30 29 28127 26 25 24|23 22 21 20 | 19 18 17 16|15 14 13 12 | 11 10 09 08|07 06 05 04|03 02 01 00 

+ h — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 

| PMCTRl, low longword I :+08 

+ + H + + + + + + + + — + + — + + + + + + + + + + + + + + + + + + + + 

I PMCTRl, high longword I :+12 

+ — + — + — + — + — +--+ — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + — + 
63 62 61 60 159 58 57 56|55 54 53 52 | 51 50 49 48|47 46 45 44|43 42 41 40|39 38 37 36|35 34 33 32 



18.2.2 Memory Data Structure Updates 

When the performance monitoring facility is enabled, the memory data structure is updated from 
the hardware counters if the PMCTRO counter is more than half full and the current processor 
IPL is below IB (hex), if a LDPCTX instruction is executed and the PME bit in the new PCB is 
off, or if the performance monitoring facility is disabled via a write to the PME processor register. 
The PME bit is internally implemented as ECR<PMF_ENABLE>, with conversion handled by 
microcode. 
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When the PMCTRO counter reaches half full, an interrupt at IPL IB (hex) is requested. This 
interrupt request is serviced like any other interrupt if the IPL of the processor is below that 
of the interrupt request IPL. like any other interrupt, it is serviced between instructions (or in 
the middle of the interruptable string instructions). Unlike other interrupts, the performance 
monitoring interrupt is serviced entirely by microcode, with no software interrupt handler 
required. 

When a performance monitoring interrupt occurs, microcode temporarily disables the facility, 
reads and clears the hardware counters, then updates the memory data structure with the 
hardware counts. The facility is tnen~re-enabled, the interrupt is dismissed, and the interrupted 
instruction stream is restarted. 

NOTE 

Although the performance monitoring facility is disabled during the memory update 
process, it is re-enabled for the restart of the interrupted instruction stream. Therefore, 
depending on what events were selected, the facility may count events that are part of 
the restart process. 

At the maximum rate (one increment every 14ns CPU cycle), an interrupt is requested every 459 
microseconds. 

If a LDPCTX is executed and the PME bit in the new PCB is off, or if the performance 
monitoring facility is disabled via a write to the PME processor register, the microcode disables 
the performance monitoring facility, reads and clears the hardware counters, and updates the 
memory data structure for the CPU with the Tiait^^ 

NOTE 

The hardware counters are not cleared, and the memory data structures are not 
updated when the performance monitoring facility is disabled via a direct write to 
ECR<PMF_ENABLE>. 

18.2.3 Configuring the Performance Monitoring Facility 

Before the performance monitoring facility is enabled, software must select the source of the event 
to be counted. This is accomplished first by selecting the box that reports the event, and then by 
selecting the event that is to be counted. The box section is made by writing to the PMFJPMUX 
field in the ECR processor register, as indicated by Table 18—1. 

Table 18-1: Performance Monitoring Facility Box Selection 

ECR<PMF_PMUX> 



(binary) Source of Information 

00 Ibox 

01 Ebox 

10 Mbox 

11 Cbox 



The event selection within the box is made by writing to a processor register within the box, as 
described in subsequent sections, and in the box chapters elsewhere in this specification. 
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The hardware used to implement the 16-bit counters is constructed such that the PMCTR] 
counter increments only if both its selected event, and the PMCTRO selected event are tru< 
simultaneously. As such, PMCTR1 is a strict subset of PMCTRO. As a result, some combinations 
of event selections will not cause PMCTR1 to be incremented. In some boxes, the event selectior 
is specified in such a way that compatible events are automatically selected. In other boxes, the 
user must specify compatible events. Where they are required, compatible events are describee 
in the sections below. 

18.2.3.1 Ibox Event Selection 

The Ibox reports only one event, so if the Ibox is selected, that event is also selected. The Then 
inputs to the PMCTRO and PMCTR1 hardware counters are shown in Table 18-2 



Table 18-2: Ibox Event Selection 



PMCTRO Input 


PMCTR1 Input 


Description; Use 


VIC Access 


VIC Hit 


VIC hits compared to total VIC accesses; VIC hit ratio. 



18.2.3.2 Ebox Event Selection 

The Ebox reports several events, as selected by the PMF_EMUX field in the ECR processoi 
register. The Ebox inputs to the PMCTRO and PMCTR1 counters are shown in Table 18-3. 

Table 18-3: Ebox Event Selection 



ECR<PMF_EMUX> 

(binary) PMCTRO Input PMCTRl Input Description; Use 



000 


Cycles 


S3 Stall 


S3 stalls (source queue, MD, Wn, Fbox scoreboard 
hit, Fbox input) compared to total cycles; S3 stalls 
per unit time. 


001 


Cycles 


EM+PA queue 
Stall 


EM latch and PA queue stalls compared to total 
cycles; EM+PA queue stalls per unit time. 


010 


Cycles 


Instruction 
Retire 


Ebox and Fbox instructions retired compared to total 
cycles; CPI. 


011 


Cycles 


Total stall 


Total Ebox stalls compared to total cycles; Stalls pei 
unit time. 


100 


Total stall 


S3 Stall 


S3 stalls compared to total stalls; S3 stalls as a 
percentage of all stalls. 


101 


Total stall 


EM+PA queue 
Stall 


EM latch and PA queue stalls compared to total 
stalls; EM and PA queue stalls as a percentage oi 



all stalls. 
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Table 18-3 (Cont.): Ebox Event Selection 



ECR<PMF_EMUX> 

(binary) PMCTRO Input PMCTRl Input Description; Use 



111 



S5 Micro word 
event 



S5 Microword 
event 



Number of times a microinstruction whose MISC 
field contained INCR.PERF.COUNT reached S5. By 
using the patchable control store, one may count 
microcode events by setting the MISC field of selected 
microwords to this value. If this event is selected, 
writing to the PMFCNT processor register will 
increment the counters via the MISC field decode. 



18.2.3.3 Mbox Event Selection 

The Mbox reports several events, as selected by the PMM field in the PCCTL processor register. 
The Mbox inputs to the PMCTRO and PMCTRl counters are shown in Table 18-4. 



Table 18-4: Mbox Event Selection 



PCCTL<PMM> 

(binary) PMCTRO Input PMCTRl Input 



Description; Use 



000 
001 
010 
011 
100 
101 
110 

111 



SO I-stream TB SO I-stream TB 

access hit 1 

SO D-stream TB SO D-stream TB 

access hit 1 



P0/P1 I-stream P0/P1 I-stream 

TB access TB hit 1 

P0/P1 D-stream P0/P1 D-stream 

TB access TB hit 1 

I-stream Pcache I-stream Pcache 

access hit 



D-stream 
Pcache access 



D-stream 
Pcache hit 



Total reads and Unaligned reads 
writes and writes 



TB hits for SO I-stream references compared to total 
TB accesses for SO I-stream references; SO I-stream 
TB hit ratio. 

TB hits for SO D-stream references compared to total 
TB accesses for SO I-stream references; SO D-stream 
TB hit ratio. 

TB hits for P0 and PI I-stream references compared 
to total TB accesses for P0 and PI I-stream 
references; P0/P1 I-stream TB hit ratio. 

TB hits for P0 and PI D-stream references compared 
to total TB accesses for P0 and PI D-stream 
references; P0/P1 D-stream TB hit ratio. 

Pcache hits for I-stream references compared to total 
Pcache accesses I-stream references; I-stream Pcache 
hit ratio. 

Pcache hits for D-stream references compared to 
total Pcache accesses D-stream references; D-stream 
Pcache hit ratio. 

Selection causes UNPREDICTABLE behavior of the 
performance monitoring hardware. 

Unaligned virtual reads and writes compared to total 
virtual reads and writes; Unaligned references as a 
percentage of all references. 



1 TB hit count is unconditionally incremented when MAPEN=0 
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18.2.3.4 Cbox Event Selection 

The Cbox reports several events, as selected by the PM_ACCESS_TYPE and PM_HITTYPE 
fields in the CCTL processor register. The Cbox inputs to the PMCTRO counter are shown in 
Table 18-5 and the Cbox inputs to the PMCTR1 counter are shown in Table 18-6. For the 
Cbox, all of the PMCTR1 selections shown in Table 18-6 are compatible with all of the PMCTRO 
selections shown in Table 18-5. 

Table 18-5: Cbox PMCTRO Event Selection 

CCTL<PM_ACCESS_TYPE> 
(binary) PMCTRO Input 

000 Bcache coherency access. PMCTRO increments when the Bcache processes any 
coherency request from the NDAL. 

001 Bcache coherency READ access. PMCTRO increments when the Bcache processes a 
IREAD or DREAD coherency request from the NDAL. 

010 Bcache coherency OREAD access. PMCTRO increments when the Bcache processes an 
OREAD OR WRITE coherency request from the NDAL. 

011 Selection causes UNPREDICTABLE behavior of the performance monitoring hardware. 

100 Bcache CPU access. PMCTRO increments when the Bcache processes any reference 
from the CPU. 

101 Bcache CPU IREAD access. PMCTRO increments when the Bcache processes an 
instruction-stream read request from the CPU. 

110 Bcache CPU DREAD access. PMCTRO increments when the Bcache processes an 
data-stream read, or read-with-modify-intent request from the CPU. 

111 Bcache CPU OREAD access. PMCTRO increments when the Bcache processes a 
data-stream read lock, write, or write unlock request from the CPU. 



Table 18-6: Cbox PMCTR1 Event Selection 

CCTL<PM_HIT_TYPE> 

(binary) PMCTRl Input 

00 Bcache hit. PMCTRl increments when a Bcache access results in any hit. 

01 Bcache hit owned. PMCTRl increments when a Bcache access results in an owned hit. 

10 Bcache hit valid. PMCTRl increments when a Bcache access results in a valid hit. 

11 Bcache miss owned. PMCTRl increments when a Bcache access results in a miss in 
which both the valid and owned bits were set. 



18.2.4 Enabling and Disabling the Performance Monitoring Facility 

The performance monitoring facility is enabled or disabled by setting or clearing the Performance 
Monitor Enable (PME) bit in the CPU. This bit may be written in one of three ways: with a write 
to the PME processor register, by loading a new value with a LDPCTX instruction from the PME 
bit in the new PCB, or by a direct write of the ECR<PMF_ENABLE> bit. 
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The format of the PME processor register is shown in Figure 18-3. 



Figure 18-3: IPR 3D (hex), PME 



31 30 29 28|27 26 25 24|23 22 21 20|19 18 17 16|15 14 13 12 111 10 09 08|07 06 05 04 1 03 02 01 00 



•+ 



SBZ 



I :PME 



+■ 



■+ 



ENABLE — + 



If PME<0> is written with a 1, the performance monitoring facility is enabled. If PME<0> 
is written with a 0, the performance monitoring facility is disabled. Direct writes to 
ECR<PMF_ENABLE> are similar to writes to PME<0>, with the exception that the hardware 
counters are not automatically cleared, and the memory counters are not updated on an explicit 
write to E CR<PMF_ENAB LE > . 

The CPU PME bit is also loaded by the LDPCTX instruction from PCB+92<31>. 



The longword at offset 58 (hex) from the SCB and the correct unique CPUID value for 
each CPU must be initialized before the performance monitoring facility is enabled. 
Failure to do so will result in UNDEFINED behavior of the system. 

The CPU PME bit is cleared, and the performance monitoring facility is disabled, at powerup. 



18.2.5 Reading and Clearing the Performance Monitoring Facility Counts 



In normal operation, microcode automatically updates the memory counters by reading the 
current value of the hardware counters, adding these values to the memory counters, and clearing 
the hardware counters. This is, the preferred mode of operation. 

However, there may be some situations in which software wishes to directly read or clear the 
hardware counters. The current value of the hardware counters may be read from the PMFCNT 
processor register, whose format is shown in Figure 18-4. 



CAUTION 
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Figure 18-4: IPR 7B (hex), PMFCNT in PMF Format 



31 30 29 28|27 26 25 24 (23 22 21 20|1S 18 17 16|15 14 13 12(11 10 09 08|07 06 05 04 | 03 02 01 00 

K + + + + + + + + + + + + + + + + + — + + + + + + + + + + + + + + + 

I Current Hardware PMCTR1 Value | Current Hardware PMCTRO Value | : PMFCNT 



The current value of the 16-bit hardware PMCTR1 counter is returned in PMFCNT<31:16> and 
the current value of the 16-bit hardware PMCTRO counter is returned in PMFCNT<15:0>. 

The two 16-bit hardware counters may be explicitly cleared by software by writing a 1 to 
ECR<PMF_CLEAR>. If the counters are explicitly cleared, any outstanding interrupt request 
is also cleared. It is strongly suggested that the hardware counters not be cleared while the 
performance monitoring facility is enabled. 

If the performance monitoring facility is configured to select the Ebox microword event 
(ECR<PMF_PMUX>=Ebox, ECR<PMF_EMUX>=S5 microword event, ECR<PMF_ENABLE>=1), 
a write of any value to the PMFCNT processor register will increment both hardware counters. 

TEST NOTE 

The performance monitoring facility hardware incrementers may be tested by clearing 
them via ECR<PMF_CLEAR>, selecting the Ebox S5 microword event, and enabling 
the facility. Each write to the PMFCNT processor register will then increment both 
hardware counters, and the result may be observed by reading the PMFCNT register. 
The interrupt request may be tested by incrementing the PMCTRO hardware counter 
into bit<15>, which will cause an interrupt to be requested. 

NOTE 

If the 16-bit hardware counters are explicitly cleared by writing a 1 to 
ECR<PMF_CLEAR>, any count in these registers is lost and will not be included in 
the memory counters. 

CAUTION 

The performance monitoring hardware also provides the WBUS LFSR function under 
control of ECR<PMF_LFSR>. The operation of the hardware is UNDEFINED if both 
ECR<PMF_ENABLE> and ECR<PMF_LFSR> are on, or if software uses a single 
MTPR write to turn off one bit and turn on the other simultaneously. That is, if 
either bit is on, software must turn off both bits with one MTPR and turn on the other 
with a second MTPR. 



1 8.3 Hardware and Microcode Implementation of the Performance Monitoring 
Facility 

The performance monitoring facility is implemented via both CPU chip hardware and microcode. 
A block diagram of the performance monitoring hardware is shown in Figure 18-5. 



18-8 Performance Monitoring Facility 



DIGITAL CONFIDENTIAL 



Figure 18-5: 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 
Performance Monitoring Hardware Block Diagram 
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The lower 16 bits of the PMCTRO and PMCTR1 performance counters are implemented as two 
16-bit incrementers in the Ebox. Both incrementers have a common clear line which is driven 
from an S5 decode of MISC/CLR.PERF. COUNT, and each has a separate carry-in input to cause an 
increment in the appropriate counter. The 32-bit concatenated value from the incrementers can 
be read onto E_BUS%ABUS_L (the active-low variant of E%ABUSJE), and the upper bit of PMCTRO 
is used to generate E_PMN%FMON_L, the performance monitoring facility interrupt request. 

The PMCTRO and PMCTR1 carry-in inputs are supplied by PMUXO and PMUX1, with the 
PMCTR1 carry-in signal gated with the PMCTRO carry-in signal. This makes PMCTR1 counter 
a strict subset of the PMCTRO counter. Increments of both counters are suppressed if the 
performance monitoring facility is not enabled, or if the PMCTRO counter has reached its 
maximum value. 
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The top-level selection of events is determined by ECR<PMF_PMUX>, which selects the source 
to PMUXO and PMUX1. This selects the source (Ibox, Ebox, Mbox, Cbox) of the carry-in signals 
to each counter. Distributed in the appropriate boxes are second-level muxes which are selected 
to provide the actual source of the increment events for PMCTRO and PMCTR1. 

18.3.1 Hardware Implementation 

The two 16-bit hardare counters are implemented as side-by-side incrementers in the Ebox 
datapath (this hardware also implements the Wbus LFSE reducer that is described in the 
testability section of Chapter 8). The carry-in signals for each of the counters are driven from two 
4-to-l muxes that are selected by E CR<PMF_PMUX> , and which select the appropriate source 
of inputs to the incrementers. 

Logic in the Ibox, Mbox, and Cbox select the appropriate values to drive the two carry-in signals 
based on processor register fields in the box. The Ebox carry-in signals are selected locally and 
provide the fourth input to the muxes. The PMCTR1 carry-in signal is forced to be a subset of the 
PMCTRO carry-in signal by ANDing the raw PMCTR1 carry-in signal with the PMCTRO carry-in 
signal to produce the final PMCTR1 carry-in signal. 

Because the PMCTR1 increment is a strict subset of the PMCTRO increment, the ultimate source 
of the two carry-in signals align them such that they are valid in the same cycle. For example, 
if the selcted conditions are IREAD PCACHE ACCESS and PCACHE HIT, these two signals are 
valid in the same cycle, and they refer to the same reference. Therefore the assertion of IREAD 
PCACHE ACCESS is delayed until the cycle in which PCACHE HIT is valid. In addition to 
this, the source of the carry-in signals guarantees that any events that may be retried are only 
recorded once. For example, a particular Pcache access causes only one increment, even if it is 
retried multiple times. 

When the 16-bit PMCTRO counter increments into the high-order bit, an interrupt is requested by 
asserting the E_PMN%PMONJL signal to the interrupt section, unless the hardware is configured 
to enable LFSR mode. This signal is sampled by edge-sensitive logic, so the interrupt request is 
maintained until it is cleared by writing a 1 to the appropriate bit in the INT.SYS register, even 
if the performance monitoring facility hardware counters are subsequently cleared. 

When the 16-bit PMCTRO incrementer reaches its maximum value, subsequent increments of 
either counter are inhibited by blocking the clocks to the logic when a carry-out is detected 
from PMCTRO. In normal operation, this should not occur, but the counter may overflow if the 
interrupt request isn't serviced within several hundred microseconds, as would be the case if 
software spent an extended period of time a high IPL with the performance monitoring facility 
enabled. 

The 32-bit concatenated value of the two 16-bit hardware incrementers can be read onto 
E_BUS%ABUS_L when selected by an S3 decode of A/PERF. COUNT. This is the mechanisim by 
which microcode retrieves the current values of the two incrementers. The 32-bit concatenated 
value is cleared by an S5 decode of MISC/CLR.PERF. COUNT. The clear is done independent of 
whether the logic is enabled for performance counting or LFSR mode. 
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1 8.3.2 Microcode Interaction with the Hardware 

There are several points at which the microcode interacts with the performance monitoring facility 
hardware. At powerup, microcode clears both of the 16-bit hardware incrementers and any 
potential interrupt request. 

MICROCODE RESTRICTION 

If the performance monitoring facility hardware incrementers are cleared in cycle 'n' via 
MISC/CLR.PERF. COUNT, INT.SYS<28> must be written with a 1 no earlier than cycle 
'n+3' to guarantee that the interrupt request is cleared. This delay is due to latency 
introduced between the performance monitoring factility hardware and the interrupt 
section. 

Microcode reads the current value of the hardware incrementers via A/PERF. COUNT as a 
byproduct of a read of the PMFCNT processor register, and as part of the process of updating the 
memory counters. 

Microcode clears the hardware incrementers via MISC/CLR.PERF. COUNT when 
ECR<PMF_CLEAR> is written with a 1. Microcode also clears the incrementers after reading 
and updating the memory counters. 

Microcode uses the CPUID processor register value to find the pair of quadwords that contain 
the performance counter values for this CPU. This value must be correctly initialized by either 
console firmware or software before the performance monitoring facility is enabled. The operation 
of the processor is UNDEFINED if CPUID is not correctly initialized. 

The memory counters are updated under three circumstances: when a performance monitoring 
facility interrupt is serviced, when the facility is disabled via a write to the PME processor register, 
and when the facility is disabled by loading a new value of PME is LDPCTK. The memory updates 
are done in a common subroutine by disabling the facility by clearing ECR<PMF_ENABLE>, 
reading the current value of the hardware incrementers and then clearing them, and updating 
each quadword in memory with the appropriate 16-bit hardware value. 
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18.4 Revision History 



Table 18-7: 


Revision History 




Who 


When 


Description of change 


Mike Uhler 


12-Sep-1990 


Reverse the definition of the TB selections for the Mbox 






performance monitoring mux 


Mike Uhler 


12-Jan-1990 


Initial release 


Mike Uhler 


02-Jul-1990 


Update to reflect implementation 


Mike Uhler 


13-Feb-1991 


Update to relect pass 1 design 


Mike Uhler 


12-Aug-1991 


Minor updates to clarify interrupt request 
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Chapter 19 

Testability Micro-Architecture 



19.1 Chapter Overview 

This chapter describes the NVAX CPU chip's Testability Micro-Architecture — a framework of 
testability features implemented throughout the NVAX CPU chip. 

The chapter does not detail the motivation for testability features or discuss the actual method 
of their uses in various life cycle testing phases. These is covered elsewhere. (For example, see 
in [1].) 

1 9.2 The Testability Strategy 

The NVAX CPU chip's testability strategy addresses the broad issue of providing cost-effective 
and thorough testing during many life cycle testing phases. The strategy specifically implements 
test features to support 

• chip debug 

• high fault coverage test at wafer probe and packaged chip test 

• support "reduced probe contact" wafer probe test 

• support for effective chip burn-in test 

• support module interconnection test via boundary scan and in-circuit-test (ICT) via a single 
pin tristate feature. 

The strategy uses a combination of a variety of testability techniques and approaches that are best 
suited to address the specific functional elements in the chip. The cost-effective implementation 
is realized by the appropriate consideration of global issues, by unifying the test objectives, by 
sharing test resources and by exploiting features inherent in the chip. The strategy also relies 
on leveraging off the design verification patterns in developing production test patterns to meet 
the fault coverage goals. 

The test features are implemented such that they have no effect on the targeted performance of 
the chip. 
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19.3 Test Micro-Architecture Overview 

The NVAX CPU chip's Test Micro-Architecture consists of two principal elements: Test Interface 
Unit and the Testability Features. 

Test Interface Unit 

The Test Interface Unit (TIU) implements a comprehensive test access strategy for the N\%X 
CPU. It permits an efficient access to testability features implemented on the chip. 

Figure 19-1 : Test interface Unit 
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TIU shown in Figure 19-1 consists of three ports: an IEEE P1149.1 (JTAG) serial test port, 
a parallel test port and an "invisible" port consisting of test pads. The serial test port is a 4- 
pin dedicated test access port conforming to the IEEE P1149.1 (JTAG) standard. It is used for 
accessing the boundary scan register. 

The parallel test port consists of 15 dedicated pins. This port is used for accessing internal scan 
registers and test features which benefit from parallel access (for example, microaddress bus). 

The Test Pads primarily facilitates micro-probing during chip debug. These pads are located at 
strategic nodes throughout the chip. 

The NVAX CPU also has a special 2-pin serial port consisting of T T EST_DATA_H and TEST. 
STROBE_H that allow the PCache to be loaded serially under control from special microcode. 
This feature has been provided to support convenient self-test operation during the chip burn-in 
test. For more details see Section 19.7 
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In addition to these test ports, NVAX also uses the normal system port (pins) for test access. This 
access consists of using the VAX instructions to manipulate a testability feature or to perform 
the actual tests on the chip's logic. 

Table 19-1 summarizes the dedicated test pins for NVAX. 



Table 1 9-1 : NVAX CPU's Test Pins 



Pin Name 



Pin Type 



Pin Function 



TDI_H 
TDO_H 

TMSJH 
TCKH 

PP_CMD_H<2:0> 
PP_DATA_H<11K)> 
DISABLEOUTJL 
TEST_DATA_H 
TEST_STROBE_H 
OSC_TEST_H 
OSC_TCl_H. OSC_TC2_H 
TEMP H 



Input, Pull-up 

Output, Tri- state, 2 
receivers 

Input, Pull-up 

Input, pull-down 

Input, pull-up 

Output 

Input, Pull-up 

Input, Pull-up 

Input, Pull-up 

Input 

Input 

Output 



IEEE 1149.1 Serial Test Data Input 
IEEE P1149.1 Serial Test Data Output 

IEEE 1149.1 Test Mode Select 
IEEE 1149.1 Test Clock 
Parallel Port: Command Pins 
Parallel Port: Data Pins 
Disables (tri state) all output drivers 
Data for serially loading PCache 
Strobe for serially loading PCache 
Test clock enable. See Section 3.2.2 
Test clocks. See Section 3.2.2 
Temperature sensor. See Section 3.2.5 



Testability Features 

The testability features facilitate the testing of the chip, module, or system. The testability 
features are scattered throughout the NVAX CPU chip. The features implemented primarily use 
internal scan registers, LFSR Reducers and boundary scan register. 

19.4 Parallel Test Port 

This port allows the critical chip nodes to be either controlled or monitored in parallel. The port 
consists of 15 dedicated test pins as follows: 

— PPJDATA_H<11:0>: A 12 bit output pins that provide control to or observability of various 
internal nodes. 

— PP_CMD_H<2:0>: Selects up to eight different test configurations at the parallel port. 
Table 19-2 lists the Parallel Port's configurations. 

NOTE 



1. When the parallel port is not in use, internal pull-ups on PP_CMD_H<2:0> pins 
force the port into an inactive (Ebox observe MAB) state. 
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2. PP_CMD_H< 0 > pin is also used as pseudo-TRSTJL pin to reset JTAG circuits. 



Table 19-2: Parallel Port Operating Modes 



Command Pins 
PP_CMD_H<2:0> Port Mode 


PP_DATA_H<11:0> 


Data Pins 
Signals controlled/Observed 


ill 


vUBci Vc XYJLnLD \XJtSLa.ll±i// 


PPJDATA_H<11> 


Internal pm_2. 






PPJDATA_H<10:0> 


Ebox MAB. See Section 9.5. 


110 

XXV/ 


VUDCI V C 1XA.USJJL 


PPJDATA_H<11:9> 


S5 Reference Source. See Section 






PP_DATA_H<8:4> 


S5 command. See Table 12-1. 






PP_DATA_H<3> 


M%MMK_FAULT_H . 






PP_DATAJH<2> 


S5 Abort. 






PPJDATA_H<1> 


S5 TB Miss. 






pp tiata TT^-n-v 


qc pp n „i> e Til*. 


10 1 


Observe Cbox/Mbox 


* ST A_HvXl JSf ^ 


Vm>OOa BC_1b_CHDCS»>. occ xauic xo— 






l*FJDArA_JU<8> 


CboX DKALLOC. 






PP_DATA_H<7> 


CboX BC_HTT. 








Mbox ]MD Destination. See Sectioi 






PP_DATA_H<3:0> 


MboxMME State. See Section 12. 


10 0 


Observe Ibox 


PPJDATA_H<11> 


Internal pm_2. 






PP_DATA_H<10:7> 


Undefined. 






PPJDATA_H<6:0> 


I-MAB. See Section 7.11.3. 


Oil 


Enable LFSR Mode 


PP_DATA_H<11K)> 


Undefined. 


0 10 


Undefined 


PP_DATA_H<11K» 


Undefined. 


0 0 1 


Shift ISRs 


PP_DATA_H<11:3> 


ISR1 (Control Store data). 






PP_DATA_H<2:0> 


ISR2 (Other internal scan d- 4 ^). 


0 0 0 


Force MAB 


PP_DATA_H<11H>> 


Undefined. See Section 9.5. 



1 9.4.1 Parallel Port Operation 



Internal Scan Registers 

When sliifting, the ISR bits are serial to parallel converted. They change every third cycle on 
internal PHI_4. This gives usable time with respect to the NDAL clocks. The parallel port 
commands are captured synchronously with respect to the NDAL clocks, in NDAL phase 3. In 
order to give full flexibility in capturing a given internal cycle, a mechanism is provided to delay 
the c^pture-and-start-shifting event by 0, 1, or 2 cycles. This delay is deteraiined by the state 
of the parallel port bits PP_CMD< 1:0 > immediately before entering the Shift ISR mode. ('00' 
corresponds to zero delay, '01* corresponds to 1 cycle delay and '10' correspond to two cycle delays.) 
See the timing diagrams in Figure 19—2 
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Chapter 20 

Electrical Characteristics 

20.1 Introduction 

This chapter specifies the electrical characteristics to which one must adhere in order to incorporate 
the chip in a system. Related information may be obtained from the following documents: 

1. NVAX Module Signal Integrity Handbook. 

2. CMOS-4 Technology File, revision 2.3. 

3. NVAX CPU Module Inter-chip Specification. 

4. NVAX CPU Chip Functional Specification, Chapter 3, Chapter 13, and Chapter 17. 

20.2 NVAX DC Operating Characteristics 
20.2.1 Maximum Ratings 



Table 20-1 : Maximum Ratings 



Parameter 


sym 


min 


max 


units 


comments 


internal supply voltage 


VDDi 


3.0 


3.465 


Vdc 


3.3V +5%/-10% including power sup- 
ply ripple 


external supply voltage 


VDDe 


3.0 


3.465 


Vdc 


3.3V +5%/-10% including power sup- 
ply ripple 


power dissipation @ 10ns cycle 






16.3 


watts 


measured atVDDi=VDDe= 3.465V 


power dissipation @ 12ns cycle 






13.8 


watts 


measured at VDDi=VDDe= 3.465V 


power dissipation @ 14ns cycle 






12.0 


watts 


measured at VDDi=VDDe= 3.465V 


power dissipation @ 18ns cycle 






9.7 


watts 


measured at VDDi=VDDe= 3.465V 


junction temperature 




0 


100 


degC 


specific ambient temperature de- 
pends on board design and air flow 
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Table 20-2: Power Dissipation Across Voltage and Cycle Time 



Cycle time min@3.2V max@3.2V max@3.465V max@3.6V units 



10ns cycle 


8.3 


13.9 


16.3 


17.6 


watts 


12ns cycle 


7.1 


11.8 


13.8 


14.9 


watts 


14ns cycle 


6.2 


10.3 


12.0 


13.0 


watts 


18ns cycle 


5.0 


8.3 


9.7 


10.4 


watts 



The power dissipation numbers given are worst-case average power dissipation measurements; 
they do not represent the peak instantaneous power dissipated on NVAX. The worst-case average 
power values were developed from the measured power dissipated when a worst-case pattern was 
run on an NVAX chip in a Neptune system. 
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20.2.2 Pin Driver Impedance 

Table 20-3 contains the acceptable range for output driver impedance, assuming worst case 
environmental skews. 



Table 20-3: NVAX Pin Driver Impedance 





Rterm 


Rterm 


rids 


Kds 


£j 


JL 


Name 


low 


high 


low 


high 


low 


high 


P%ACK_L 


10 


15 


19 


37 


29 


52 


P%CMD_H<3:0> 


10 


15 


65 


125 


75 


140 


P%CPU_HOLD_L 


12 


18 


20 


41 


32 


59 


P%CPU_REQ_L 


12 


18 


20 


41 


32 


59 


P%CPU_SUPPRESS_L 


12 


18 


20 


41 


32 


59 


P%DR_DATA_H<63:0> 


11 


17 


20 


41 


31 


58 


P%DR_ECC_H<7K)> 


11 


17 


20 


41 


31 


58 


P%DR_INDEX_H<20:3 > 


4 


6 


12 


25 


16 


31 


P%DR_OE_L 


4 


6 


12 


25 


16 


31 


P%DR_WE_L 


4 


6 


12 


25 


16 


31 


P%ED_H<2:0> 


10 


15 


65 


125 


75 


140 


P%MACfflNE_CHECK_H 


10 


15 


65 


125 


75 


140 


P%NDAL_H<63:0> 


10 


15 


65 


125 


75 


140 


P%PARITY_H<2 :0> 


10 


15 


65 


125 


75 


140 


P%PHI12_OUT_H 


8 


12 


8 


23 


16 


35 


P%PHL23_OUT_H 


8 


12 


8 


23 


16 


35 


P%PHI34_OUT_H 


8 


12 


8 


23 


16 


35 


P%Pffl41_OUT_H 


8 


12 


8 


23 


16 


35 


P%PP_DATA_H<11 K)> 


10 


15 


65 


125 


75 


140 


P%SYS_RESET_L 


12 


18 


20 


41 


32 


59 


P%TDO_H 


10 


15 


65 


125 


75 


140 


P%TS_ECC_H<5K)> 


11 


17 


20 


41 


31 


58 


P%TS_INDEX_H<20:5> 


12 


18 


20 


41 


32 


59 


P%TS_OE_L 


12 


18 


20 


41 


32 


59 


P%TS_OWNED_H 


11 


17 


20 


41 


31 


58 


P%TS_TAG_H<31:17> 


11 


17 


20 


41 


31 


58 


P%TS_WE_L 


12 


18 


20 


41 


32 


59 


Key to pin characteristics: 



Rterm — termination resistance 

Rds — device resistance 

Z — sum of resistance range 

Conditions of test: 



Vdd = 3.465v 

Tj = 0 and 100 degrees Centigrade 

Rds measured with the pin shorted to Vdd=3.465v for measuring N-MOS characteristics 

Rds measured with the pin shorted to Vss=0.0v for measuring P-MOS characteristics 

Pins cannot tolerate shorts for prolonged periods. The above information is provided for test purposes only. 
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20.2.3 Pin Capacitance 



Table 20-4: Maximum Pin Capacitance 



Pin Types 


Rating 


Unit 


I/O and output only pins 


12.0 


P F 


input only pins except for P%P H 1 XX_IN_H 


7.5 


P F 


P%PHTXX_IN_H 


8.5 


P F 



Conditions of test (in simulation): 

measured as pin capacitance to VSSi with all other pins returned to VSSi 
Tj = 27 degrees Centigrade 

measured at DC, zero bias for the junction capacitors 



20.2.4 Pin Operating Leveis 

Table 20-5 summarizes the electrical characteristics for various pin operating levels. Table 20-6 
identifies the operating level associated with each unique pin group. 



Table 20-5: NVAX Pin Levels 



Level Type 


vn 


vih 


Vol 


Iol 


Voh 


Ioh 


MaxVin 1 


Leakage 


TTLIO 2 


0.8 


2.0 


0.4 


+2mA 


2.5 


-2mA 


6V 


lOOuAmps 


TTLIN 


0.8 


2.0 










4.5V 


lOOuAmps 


TTLINPU 3 


0.8 


2.0 










Vdd+0.5V 


±200-900uAmps 


P%PP_CMD_ 
H<2*>> 


0.8 


2.0 










Vdd+0.5V 


lOOOuAmps 


CMOS 4 


0.8 


2.0 


0.4 


+2mA 


2.6 


-2mA 


4.5V 


lOOuAmps 


CMOS 5 


0.8 


2.0 


Vss+O.lV 


+40uA 


Vdd-0.1V 


-40uA 


4.5V 


lOOuAmps 


ECL IN 


-0.3V 


+0.3V 










Vdd+0.5V 


lOOuAmps 


ACKIN 6 


0.8V 


2.0 


0.4 


+17mA 






Vdd+0.5V 


lOOuAmps 



1 maximum voltage tolerable without mcurring damage 
2 5-volt tolerant 

3 pins with active pull-up or pull-down 
4 with TTL load 
8 with CMOS load 
6 active pull-up to 3.3 volts 
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Table 20-6: NVAX Pin Characteristics 



Name 


Type 


Level 


Voltage Pull-x 


P%ACK_L 


B,OD 


ACK 


3 


P%ASYNC_RESET_L 


I 


TTL 


3 + 


P%CMD_H<3:0> 


B 


TTL 


5 


P%CPU_GRANT_L 


I 


TTL 


3 


P%CPU_HOID_L 


0 


TTL 


5 


P%CPU_REQ_L 


O 


TTL 


5 


P%CPU_SUPPRESS_L 


0 


TTL 


5 


P%CPU_WB_ONLY_L 


I 


TTL 


3 


P%DISABLE_OUT_L 


I 


TTL 


3 + 


P%DR_DATA_H<63:0> 


B 


TTL 


5 


P%DR_ECC_H<7K)> 


B 


TTL 


5 


P%DR_INDEX_H<20a> 


0 


TTL 


3 


P%DR_OE_L 


O 


TTL 


3 


P%DR_WE_L 


0 


TTL 


3 


P%HALT_L 


I 


TTL 


3 


P%H_ERR_L 


I 


TTL 


3 


P%BD_H<2:0> 


B 


TTL 


5 


P%INT_TTM_L 


I 


TTL 


3 


P%ERQJL<3:0> 


I 


TTL 


3 


P%MACHINE_CHECK_H 


0 


TTL 


5 


P%ND AL_H<63 :0 > 


B 


TTL 


5 


P%OSC_H 


I 


ECL 


3 


P%OSC_L 


I 


ECL 


3 


P%OSC_TCl_H 


I 


CMOS 


3 


P%OSC_TC2_H 


I 


CMOS 


3 


P%OSC_TEST_H 


I 


CMOS 


3 


P%PARITYH<2:0> 


B 


TTL 


5 


P%PHI12_IN_H 


I 


CMOS 


3 


P%PHI12_OUT_H 


0 


CMOS 


3 


P%PHI23_IN_H 


I 


CMOS 


3 


P%PHI23_OUTH 


0 


CMOS 


3 


P%Pffl34_IN_H 


I 


CMOS 


3 


P%PHI34_OUT_H 


0 


CMOS 


3 


P%PHE41_IN_H 


I 


CMOS 


3 


P%PHI41_OUT_H 


0 


CMOS 


3 


P%PP_CMD_H<2K)> 


I 


TTL 


3 + 


P%PP_DATA_H<11H)> 


O 


TTL 


5 


P%PWRFL_L 


I 


TTL 


3 


P%SYS_RESET_L 


0 


TTL 


5 


P%S_ERR_L 


I 


TTL 


3 



Key to pin characteristics: 

LEVEL — threshold levels as per Table 20-5 

VOLTAGE — (5) 5V tolerant driver, (3) 3V tolerant driver - must not be exposed to 5V signals 

PULL-X — (+) active pull-up, (-) active pull-down 

TYPE — (B) bidirectional, (I) input, (O) output, (OD) open drain 
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Table 20-6 (Cont.): NVAX Pin Characteristics 



Name 


Type 


Level 


Voltage 


Pull-x 


P%TCK_H 


I 


TTL 


3 


- 


P%TDI_H 


I 


TTL 


3 


+ 


P%TDO_H 


0 


TTL 


5 




P%TEMP_H 


0 


< 3V 


3 




P%TEST_DATA_H 


I 


TTL 


3 


+ 


P%TEST_STROBE_H 


I 


TTL 


3 


+ 


P%TMS_H 


I 


TTL 


3 


+ 


P%TS_ECC_H<5:0> 


B 


TTL 


5 




P%TS_INDEX_H<20:5> 


0 


TTL 


5 




P%TS_OE_L 


0 


TTL 


5 




P%TS_OWNED_H 


B 


TTL 


5 




P%TS_TAG_H<31:17> 


B 


TTL 


5 




P%TS_VALID_H 


B 


TTL 


5 




P%TS_WE_L 


0 


TTL 


5 





Key to pin characteristics: 



LEVEL — threshold levels as per Table 20-5 

VOLTAGE — (5) 5V tolerant driver, (3) 3V tolerant driver - must not be exposed to 5V signals 

PULL-X — (+) active pull-up, (-) active pull-down 

TYPE — (B) bidirectional, (I) input, (O) output, (OD) open drain 
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20.3 NVAX AC Operating Characteristics 

This section specifies AC timing parameters, but is not intended to illustrate detailed transac- 
tional operation. 

20.3.1 AC Conditions of Test 

1. Tj = 0 to 70 degrees Centigrade 

2. VDDi = 3.3 volts +3.9%/-3% (3.2 to 3.43V) 

3. VDDe = 3.3 volts +3.9%/-3% (3.2 to 3.43V) 

4. Voltage levels used for timing specifications as per Table 20-5. 

5. Pin loading used for timing specifications as per Table 20-7. 

Table 20-7: Pin Loading for AC Tests 



Loading required for 
chip test on Takeda 3381 



Pin 


Total Pin Loading 


Series Resistor 


Series Capacitor 


P%DR_INDEX_H<20:3> 


140 pF 


10 ohms 


100 pF 


P%DR_OE_L 


140 pF 


10 ohms 


100 pF 


P%DR_WE_L 


140 pF 


10 ohms 


100 pF 


P%TS_INDEXH<20:5> 


60 pF 


15 ohms 


20 pF 


P%TS_OE_L 


60 pF 


15 ohms 


20 pF 


P%TS_WE_L 


60 pF 


15 ohms 


20 pF 


P%PBQXX_OUT_H 


70 pF 


22 ohms 


30 pF 


all others 


40 pF 


none 


none 



The AC conditions of test given were designed specifically with the Neptune and Omega systems 
in mind, in order to maximize chip yield. The AC conditions of test may be changed in the future 
depending upon chip yields and the needs of the system partners. 



DIGITAL CONFIDENTIAL 



Electrical Characteristics 20-7 



NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 



This page intentionally left blank. 



20-8 Electrical Characteristics 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 



20.3.2 NDAL Timing Specification 

NDAL signal timing is specified as phase and constant offsets from the NDAL clock inputs. The 
chip operating frequency determines the phase time. 

Figure 20-1 : NDAL Pin Timing Relative to the NDAL CLOCKS 



P4 



P%PHI12_IN_H __ 

P%PHI23_IN_H _ 

P%PHI34_IN_H ~~ 

P%PHI41_IN_H /" 



P%ID_H<2:0> 
P%PARITY H<2:0> 
P%NDAL_H<63 : 0> 
P%CMD H<3:0> 



P%ID_H<2 : 0> 
P%PARITY_H<2:0> 
P%NDAL_H<63:0> 
P%CMD H<3:0> 



-NDAL CYCLE- 



Pl P2 P3 P4 



P4 


K DAL C 

PI | P2 







-Gnnxnmnnm 



As driven by NVAX CPU 
Driven from P%PHI12_IN_H rising edge 
Released with P%PHI41_IN_H rising edge 



xx>ooooooooooooooooc 



ZEE 



As received by NVAX CPU : 
Latch closes with P%PHI41 IN H rising edge (latch open during phi23) 



P%ACK L 



P%ACK L 



//////////////////////////// 





As pulled low by NVAX CPU & pulled high through board pullup resistor 
NVAX pulls low w/P%PHI23_IN_H rising; NVAX releases with P%PHI23_IN_H falling 



X)00<X>0OO00O(X»0O0O000000O00CIZZZ^ 

As required by NVAX CPU : 

Latch closes with P%PHI34_IN_H rising edge 

(latch open during phi!2) : 



P%CPU_HOLD_L 
P%CPU_SUPPRESS_L 
P%CPU REQ L 



xxaooooooc 



As driven by NVAX CPU : | 
Driven with P%PHI12_IN_H rising edge 



P%CPU WB ONLY_L 
P%CPU - GRANT L 



>pooooooooooooooooooooo<xx>ooczz: 

As required "by NVAX CPU : | : 

Latch closes with P%PHI41_IN_H rising edge 
(latch open during phi23) ~ | 
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Table 20-8: NDAL AC timing specs 


Input Pin 


Setup Tune 1 


Hold Tune 


P%NDAL_H<63:0> 
P%CMD_H<3:0> 
P%ID_H<2K)> 
P%PARITY_H<2K» 


1 phase to P%PHI41„IN_H R 


P%PHI41_IN_H R + 2ns 2 


P%ACK_L 


0 ns to P%Pffl34_lNJB R 


P%PHI34_INH R + 1 phase 


P%CPU_WB_ONLY_L 
P%CPU_GRANT_L 


0 ns to P%Pffl41JN_H R 


P%PHI41_IN_H R + 1 phase 


Output Pin 


Output Valid 


Output Tristate 


P%NDAL_H<63 :0> 
P9bCMD_H<3:0> 
P%ID_H<2:0> 
P%PARITY_H<2K)> 


P%PHI12_IN_H R + 2 phases 


P%PHI41_IN_H R + 1 phase 


P%ACK_L 


P%PHI23_IN_H R + 1 phase Gow 
transition), P%PHI23_IN_H F + 3 
phases(high transition) 3 




P%CPU_HOLD_L 

P%CPU_SUPPRESS_L 

P%CPU_REQ_L 


P%PHI12_IN_H R + 1 phase 

/' 
t 





means the rising edge of the clock is used; F means the falling edge of ifae clock is used. 

/ 

2 Data may be held capacitively during the hold time. / 

3 P%ACK_L is pulled up to 3.3v through a resistor in the system and inythe test environment. 



7 
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20.3.3 BCACHE Timing Specification 

Due to the chip's clocking structure, BCACHE timing is specified relatively between the various 
BCACHE signals. 

Figure 20-2 and Figure 20-3 show the Bcache timing for a generic NVAX system. Table 20-9 and 
Table 20-10 specify the RAM timing constraints under NVAX input requirements subtitle, and 
the guaranteed chip output under NVAX output responses. This data should be used to establish 
chip input requirements and output responses in a generic system environment. Signal delays 
are dependent on chip packaging and board design. 

For the OMEGA system, the chip must meet the input constraints and the output responses spec- 
ified in Table 20-11 and Table 20—12. The OMEGA system operates at a 14ns clock cycle, using a 
128 KB cache with 16Kx4 25ns data RAMs and 4Kx4 25ns tag RAMs. This configuration requires 
the Bcache processor register settings CCTL(DATA_SPEED)=01 and CCTL(TAG_SPEED)=1 to 
allow one slip cycle for both data and tag RAM access. See Chapter 13. 

The specific timing for XNP systems is shown in Figure 20—4 and Figure 20—5. The chip must 
meet the input constraints and the output responses specified in Table 20—13 and Table 20-14. 
The XNP system operates at a 14, 12, or 10ns clock cycle, using a 2 MB cache with 256Kx4 
20ns data RAMs and 64Kx4 15ns tag RAMs. This configuration requires the Bcache processor 
register settings CCTL(DATA_SPEED)=01 and CCTL(TAG_SPEED)=1 to allow one slip cycle for 
both data and tag RAM access. 

The timing constraints for both the OMEGA and XNP systems are based upon the RAM specifi- 
cations rather than upon NVAX predicted behavior. Actual signal delays are dependent on chip 
packaging and board design. 
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9 



Generic Data RAM Pad Timing 



Pi I P2 I P3 P4 PI P2 I P3 P4 PI I P2 P3 P4 PI I P2 I P3 P4 PI I P2 I P3 



STATE 

P»DR_INDEX_H 
P*DR_DAT_H 

P%DR_OE_L 
F*DR_HE_L 



STATE 

P*DR_INDEX_H 
PIDR_DAT_H 
P*DR_OE_L 
PIDR HE 1> 



STATE 

P«DR_INDEX_H 
P*DR DAT H 



IDLE 
I. 



LOOKUP (aborted) 



»0<X>0<»00<X>0000CO00C 



Index fgr_jaaiL 



tcycle 
1_ 



-annum 



«iu<m<i<uc 



±)O006000Ci50000O00pO0C 



tt ottxyyyyxyyyyyyy 

ram xyxYxyyxyyxttxx^^ 

. | < Toht H 



'////////////////?/// 



-modify write. 



PI 1 P2 1 P3 I P4 




PI | P2 | P3 | P4 


PI | P2 | P3 | P4 


PI 1 P2 I P3 1 P4 




IDLE 
1 


PI | P2 | P3 | P4 

LOOKUP 
I 


: LOOKUP 
t 


"RITE 
1 


WRITE 
1 






X*XxYxYxYxXxxXxXx*xX Index for read «od 


fv write 






: ^ 


, T ": : 1:1:1:1:1 

imuiuuuu mtu ram d»t. >i»»»i>iimium»ihi>im-<fff<ff mmuiui hv«c , 






j — <««««« 


|-« Toe H . 1 . <— — Tohr H 

f vi //Hi///)///////)//) 1 


l 


1 




.-H k-Tow • • 


1 


1 

d write 


1 : 

followed by read. 

P2 | P3 | P4 


: 1 : 

PI | P2 | P3 J P4 


: 1 : 

PI | P2 J P3 | P4 


PI | P2 j P3 | P4 


r : 1 

PI 1 P2 1 P3 


I 

P4 




IDLE 

1 1. •. 


WHITE 
1 


WRITE 
1 


IDLE 
1 


LOOKUP 
.... - 1 



-<nmn 



HVAX driving valid write data ))>> 



<ihmm 



driving. 
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Table 20-9: Generic Data RAM Timing Specification 


Par am 


Function 


NVAX Input Requirement 


Taa 


address access to RAM data valid 


< (7 phases - P%DR_INDEX_H drive-to-valid delay) 


Toe 


OE assertion to RAM data valid 


< (5 phases - P%DR_OE_L deasserted-to-asserted delay) 


Toh 


RAM data, output hold from INDEX 


> 2.0ns 




change 




Tohz 


OE deassertion to RAM data high-z 


< (4 phases - r n yclJlL_\JJL_Lj assertea-to-aeassertea delay) 


Param 


Function 


NVAX Output Response 


Tto 


data high-z to OE assertion 


> 0.0ns 


Tdw 


data valid to WE deassertion 


> (5 phases - P%DR_DAT_H drive-to-valid delay) 


Twp 


WE pulse 


> (6 phases - P%DR_WE_L deasserted-to-asserted delay) 


Taw 


address valid to end of write 


> (10 phases - P%DR_INDEX_H drive-to-valid delay) 


Tnz 


NVAX tri state time 


< 1 phase 


Twr 


write recovery (WE deassertion to 


> 0.0ns 




INDEX change) 




Tdh 


data hold after WE deassertion 


> 0.0ns 


Tas 


address setup 


> 0.0ns 


Tow 


OE deassertion to WE assertion 


> 0.0ns 
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Generic TAG RAM pad Timing 



TAG RAM read followed by another read. 

PI I P2 I P3 I P4 PI I P2 I P3 I P4 PI I P2 I P3 I P* PI I C2 I P3 I P4 PI I P2 I P3 I P4 



STATE 

P*TS_INDEX_H 
PtTS TAG H 



P»TS HE 1< 



DO00O000O00O0000000OC 



LOOKUP 

Indw for i«a d 



bcyde 

I 



unnnnnm 



3 



IDLE 
J. 



D0O00OOO00OO0O0OO000C 



LOOKUP 

L_ 



pah 4aU »»»» 



»»»»'»>,»»»»»»> — aihznixnBinzr 



'/////////// 



RAMS drlvlm i 



xmnmnnn 



JSSSSSSSSSSSSSl. 



TAG PAH quadword write followed by read. 



STATE 

P»TS_IMDBX_H 
PtTS TAG H 



PtTS HE L 



PI 1 P2 1 P3 1 P4 


PI 1 P2 1 P3 1 P4 PI 1 P2 | P3 1 P4 


PI j P2 J P3 | P4 






IDLE 

Jwwwvyvwww 


WRITE 


WRITE 

1 


IDLE 

yvwvyvyvwvww 


PI j. P2 | M | P4 
LOOKUP 

wv ' ' ' — — 


1 


' ' 'il ill il Xl 1 1 ll l i'i ill i 1 . I .1 — ' il. '-T-. r-r-l #ttv /fViiimi 




1 


1:11:1: : ^7-- — 




1 Mill llllllllll 

Tnz-» K- . RAMS 
1 


driving. 






' ' ' ' TTTTTTTTTTYTTTV 


vrv 


. Twr-*- 
//111 11 11 il 1 111 /' 


Tfco 






1 


1 : 1 : 1 r~ — 1 : I W | : | : | 
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Tabie 20-10: Generic Tag RAM Timing Specification 



Pax-am 


Function 


NVAX Input Requirement 


Taa 


address access to RAM data valid 


< (7 phases - P%DR_INDEX_H drive-to-valid delay - 1.5ns) 


Toe 


OT?, *iRRPT"fci<vn tz> eintsi valid 




Par am 


Function 


NVAX Output Response 


Tto 


high-z to OE assertion 


> 1.5ns 


Tdw 


data valid to WE deassertion 


> (5 phases - P%TS_TAG_H drive-to-valid delay) 


Twp 


WE pulse 


> (6 phases - P%TS_WE_L drive-to-valid delay) 


Twr 


write recovery 


> -2.0ns 


Tdh 


data hold time 


> 1.0ns 


Tas 


address setup 


> (4 phases - P%TS_INDEX_H drive-to-valid delay) 
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Table 20-11: OMEGA-Spectfic Data RAM Timing Specification 

128KB Bcache, 25ns Data RAMs, 25ns Tag RAMs, 14ns cycle 
NVAX Test Input Requirements 



Par am Function Timing Measuring Point Notes 



Taa 


address access to data valid 


> 25.0ns 


INDEX 2.4H/.4L 


must be met before 
tester drives data 


Tbe 


OE assertion to data valid 


> 12.0ns 


OE AL 


must be met before 
tester drives data 


Toh 


output hold 


< 3.0ns 


INDEX .4H/2.4L 


tester hold time 


Tohz 


OE deassertion to data high-z 


> 10.0ns 


OE 2.4H 


tester hold time, chip 
overdrives 


Tcycle 


internal cycle time 


14.0ns 






NVAX Test Output Responses 


Par am 


Function 


Timing 


Measuring Point 


Notes 


Tto 


high-z to OE assertion 


> 0.0ns 


OE 2.4L 




Tdw 


data valid to WE deassertion 


> 10.0ns 


DAT 2.4H/.4L 




Twp 


WE pulse 


> 15.0ns 


WE AL, WE 2.4H 




Twr 


write recovery 


> 0.0ns 


WE AL, INDEX .4H/2.4L 




Tdh 


data hold time 


> 0.0ns 


WE2.4H 




Tas 


address setup 


> 0.0ns 


INDEX 2.4H/.4L, WE 2.4L 




Tow 


OE deassertion to WE assertion 


> 0.0ns 


OE 2.4H, WE AL 
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Table 20-12: OMEGA Specific Tag RAM Timing Specification 


512KB Bcache, 12ns Data RAMs, 12ns Tag RAMs 


NVAX Test Input Requirements 


Par am 


Function 


Tuning 


Measuring Point 


Notes 


Taa 


address access to data valid 


> 12.0ns 


INDEX 2.4H/.4L 


must be met before 
tester drives data 


Toe 


OE assertion to data valid 


> 6.0ns 


OE .4L 


must be met before 
tester drives data 


Tcycle 


internal cycle time 


14.0ns 






NVAX Test Output Responses 


Par am 


Function 


Timing 


Measuring Point 


Notes 


Tto 


high-z to OE assertion 


> 0.0ns 


OE 2.4L 




Tdw 


data valid to WE deassertion 


> 6.0ns 


DAT 2.4H/.4L 




Twp 


WE pulse 


> 12.0ns 


WE Ah, WE 2.4H 




Twr 


write recovery 


> 0.0ns 


WE .4L, INDEX .4H/2.4L 




Tdh 


data hold time 


> 0.0ns 


WE 2.4H 


• 


Tas 


address setup 


> 0.0ns 


INDEX 2.4H/.4L, WE 2.4L 
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09 

m 

a 

o" 

o 

ZT 

to 

a 

<D 
-l 

(A 

o 

(A 



Dat RAM fad Timing 



** a********************************** ******************** 
Data RAM read. Aborted due to tag nlsa. 



********** 



P%DR INDEX II 



P%DR DAT H 



P»DR OE L 



PI I F2 I f3 P4 PI I P2 I p3 p4 PI I p2 I p3 p4 PI I P2 I P3 P4 PI I P2 I P3 P4 



IDLE 

DooooooodboooooooooQC 

8.0ns— 



Index . lit , read 



IOOKUP 

DO000000O00O0OOO0OO0C 

Toh 
2na 



LOOKUP (aborted) 



IDLE 
I 



)0000000000000000< 



mmnnnnr aim: *am dat ^^xxttoooottxioo^^ 



'iiiiuiniUNiftm 



-«.0n» 

Toh» 
_J 



Data RAM read— modify write. 

PI I P2 I P3 I P4 Pi I K I P3 I P4 Pi I P2 I P3 I P4 PI I P2 I P3 I P4 PI I P2 I P3 I P4 



STATE . IDLE . . LOOKUP . . LOOKUP 

Index to ^~ 



P»DR INDEX H 



P%DR DAT H 



P»DR ME I. 



)O00O0O0000O0000000QC 



lor read modify writ' 



WRITE 
I 



mmnmjminjnnjjjijmni 



5na— — > | 



9- 



ram dlta >mmi*mm»mm> f mi ny i < <<<m > <t<»<<t<<{ 



2ZZZZZZZZZZZZZZZZZ7 



Tohr >+« 

«r 
I 



j-t-Tow . j 



^35SSSSSSS535SSS5\ 



D 
Q 

s 

r 

O 
O 
z 

o 
m 
z 



Data RAM quadword write followed by read. 

PI I P2 I P3 I P4 PI I P2 | P3 I CI PI I P2 I P3 I P4 PI I P2 I C3 I P4 PI I P2 I C3 I P4 



STATE 

P»DR_INDEX_H 
PtDR DAT H 



IDLE 
I. 



)p«xx>ooo<xxx»oooooqc 

6.0na > l < 



HRITE 
I 



)0OO0OO0OO00OO0OO0OOC 



Hsmziznzznzmm 



HVAX driving 



-9, Ona- 
TdW 



valid write dat; 



-4,0ne-H . RAMS driving. 



\WWVA\WSVASSSS 



-13.0na 
Twp 



\\\\\\\\\\\\\S5SSS. 



ft=H 1 

Tdh | 
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Table 20-13: XNP Specific Data RAM Timing Specification 



2MB B cache, 20ns Data RAMs, 15ns Tag RAMs 



NVAX Test Input Requirements 


Par am 


Function 


Timing 


Measuring Point 


Notes 


Taa 


address access to data valid 


> 20.0ns 


INDEX 2.4H/.4L 


must be met before 
tester drives data 


Toe 


OE assertion to data valid 


> 10.0ns 


OE .4L 


must be met before 
tester drives data 


Toh 


output hold 


< 5.0ns 


INDEX .4H/2.4L 


tester hold time 


Tohz 


OE deassertion to data high-z 


> 10.0ns 


OE 2.4H 


tester hold time, chip 
overdrives 


Tcycle 


internal cycle time 


14.0, 12.0, 
or 10.0ns 






NVAX Test Output Responses 


Par am 


Function 


Timing 


Measuring Point 


Notes 


Tto 


high-z to OE assertion 


> 0.0ns 


OE 2.4L 




Tdw 


data valid to WE deassertion 


> 12.0ns 


DAT 2.4H/.4L 




Twp 


WE pulse 


> 14.0ns 


WE .4L, WE 2.4H 




Twr 


write recovery 


> 0.0ns 


WE .4L, INDEX .4H/2.4L 




Tdh 


data hold time 


> 0.0ns 


WE 2.4H 




Tas 


address setup 


> 0.0ns 


INDEX 2.4H/.4L, WE 2.4L 




Tow 


OE deassertion to WE assertion 


> 0.0ns 


OE 2.4H, WE Ah 
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***************** ********************** ********* 
TAG RAM read followed by another read. 



TAG MM Pad Timing 



******************************** 



STATE 

t%TS INDEX H 



P%TS TAG H 



P«TS OE L 



PI I P2 I (3 P4 PI I P2 I f3 I M I p l I p2 I p3 F4 P1 I P2 I p3 I p4 PI I F2 I P3 I P4 



yXymXXXXXXXXXXm> r~~ index for read 
8.0n» lS.Ona 



tcycle 



\\SSSS\SS\SS\\WT 



I/30KUP 

L_ 



)0O000O00OO0OO00O00C!C 

8.0ns- 



RAM data »»»»»»»»»»,»>»»»»r 



//////m/f//////t/ 



{<<<(<<*<<»<<«<<(<(<<<<<<<(«: 



-8.0ns- 



RAMS dr i vino 



TAG RAM quadword write followed by read. 

PI I P2 I P3 I P* PI I P2 I P3 I P4 PI I P2 I P3 I P4 PI I P2 I P3 I P4 PI I P2 | P3 I P4 



STATE 

P*TS_INDEX_H 
P%TS_TAO_H 

P*TS OE li 



•vomomm^^ ' inL for irizr 



WRITE 
I 



XXXXXJ000O0000000O0OC 



<<{{««Ui««U«{t HVAX driving 



|< 8.5n» 



valid write datl ({UlUljt 



~T 

3 



nxnnnnnn 



RAMS driving. 



-13.0na- 
Twp 



I ! I Tdh 
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Table 20-14: XNP Specific Tag RAM Timing Specification 



2MB Bcache, 20ns Data RAMs, 15ns Tag RAMs 


NVAX Test Input Requirements 


Param 


Function 


Timing 


Measuring Point 


Notes 


Taa 


address access to data valid 


> 15.0ns 


INDEX 2.4H/.4L 


must be met before 
tester drives data 


Toe 


OE assertion to data valid 


> 8.0ns 


OE .4L 


must be met before 
tester drives data 


Tcycle 


internal cycle time 


14.0, 12.0, 
or 10.0ns 






NVAX Test Output Responses 


Param 


Function 


Timing 


Measuring Point 


Notes 


Tto 


high-z to OE assertion 


> 0.0ns 


OE 2.4L 




Tdw 


data valid to WE deassertion 


> 7.0ns 


TAG 2.4H/.4L 




Twp 


WE pulse 


> 15.0ns 


WE .4L, WE 2.4H 




Twr 


write recovery 


> 0.0ns 


WE .4L, INDEX .4H/2.4L 




Tdh 


data hold time 


> 0.0ns 


WE 2.4H 




Tas 


address setup 


> 0.0ns 


INDEX 2.4H/.4L, WE 2.4L 
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20.3.4 Other Pin Timing Specifications 



20.3.4.1 Clock Timing 

When P%OSC_TEST_H is not asserted the chip receives the master clock through the P%OSC_ 
H and P%OSC_L pins. Operation of the chip at the maximum internal clock speed of 100 MHz 
requires an input clock frequency of 400 MHz. These pins require capacitively-coupled, differen- 
tial waveforms, 180 degrees out of phase at inverted ECL levels. The peak-to-peak differential 
voltage must be at least 600mV, with a differential symmetry of 60/40 or better. The voltage 
should not exceed the absolute value of Vdd plus 500 mV during operation. 

When P%OSC_TEST_H is asserted the chip receives the master clock through the P%OSC_ 
TC1_H and P%OSC_TC2_H pins. Operation of the chip at the maximum internal clock speed of 
100 MHz requires an input clock frequency of 200MHz. These pins require waveforms that are 
90 degrees out of phase at CMOS input levels (Table 20—5). Each edge must be place within an 
accuracy of ± 24 degrees. 

The chip provides four double phase NDAL clocks on the P%PHEXX_OUT_H pins. The chip also 
receives these clocks through the P%PH I XXjNJB pins. The relationship of the four clocks to 
the internal CPU clock cycle is shown in Figure 20-6. 



Figure 20-6: Relationship of Internal and NDAL Clock Cycles 



|1|2|3|4|1|2|3|4|1|2|3|4| 

PHI1 PHI2 PHI3 PHI4 
I | | | | 

/ \ / 

/ \ 

\ / \ 

\ / 



The following skew specifications must be met for all NDAL clock receivers. Inter-clock skew is 
dependent on the electrical characteristics of the chip environment. 

1. The rising edge of any clock will be present at all receivers within ± 0.5 ns, as measured from 
the CMOS Vih level (see Table 20-5). 

2. The falling edge of any clock will be present at all receivers within ± 0.5 ns, as measured 
from the CMOS Vil level. 

3. The skew between the rising edge of any phase and the falling edge of any other phase will 
be no more than ± 0.75 ns, as measured from Voh to Vol. 

4. The NDAL clocks will have an edge rate of 2.0 ns or better, measured at the receiver, between 
the 10% and 90% points. 



CPU CYCLE 
NDAL CYCLE 
PHI12_0UT_H 
PHI23_0UT_H 
PHI34_0UT_H 
PHI41 OUT H 
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20.3.4.2 Reset Timing 

P%ASYNC__KESETJL is an asynchronous input. It must be asserted for a minimum 
of 7 NDAL cycles. The P%SYS_RE SET_L output is asserted asynchronously whenever 
P%ASYNC_RESET_L is asserted. P%SYS_RESET_L is deasserted synchronously with the 
rising edge of P%PHI12_OUT_EL Figure 20-7 shows the relationship between the reset signals 
and the clocks. 



Figure 20-7: System Reset Timing 



************************************************* 



NDAL CYCLE 



I NDAL CYCLE 

I 



NDAL CYCLE 



P%ASYNC_RESET_L 
P%SYS RESET L 



AW .. 

< — Asserted for a minimum of 7 NDAL Cycles 
AWWWWW .- .. 



Pll P2| P3| P4| PI 
I | | | 

111 + 
///// 



-> I 



P%PHI12_OUT_H /- 
| P%ASYNC_RESET_L asynchronous + 

I assertion causes asynchronous P%PHI23_OUT_H 

| assertion of P%SYS_RESET_L . + 

P%PHI34_OUT_H \_ 
+ 

P%PHI41 OUT H — 



P2 



P3| P4 | Pl| P2| P3 



+ I I 
///////////• 
+ I I 
/— \ 



P4 I 



************************************************* 



The clock generator can be reset to a known state by using the P%TEST_DATA_H input as shown 
in Figure 20-8. With P%AYSYNC_KESETL asserted, all clock inputs are stopped briefly (500 
nS MAX). The states of test clocks P%OSC_TCl_H and P%OSC_TC2_H when stopped must 
be the same, either both high or both low. P%TEST_DATA_H should be driven low to effect 
the clock generator reset. This immediately places the clock generator into NVAX #2 an d NDAL 
P%TEST_DATA_H is then driven high and clocking of the chip is resumed. On the first 
oscillator cycle following resumption of clocking, the generator will transition into NVAX #3 and 
begin normal sequencing. P%AYSYNC_RESET_L must remain asserted for at least 7 NDAL 
cycles following resumption of clocking. 
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Figure 20-6: Clock Generator Reset Timing 



CPU Phase 
NDAL Phase 

P%OSC_H 

P%OSC_L 

P%OSC_TCl_H 

P%OSC_TC2_H 

INTERNA1 OSC 

P%ASYNC_RESET_ 

P%SYS_RESET_L 

P%TEST_DATA_H 

NDAL Phase 1 



XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 2 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1 



/~w~w~\_/~\_ 
\ /~\ /~\ /~\ /- 



2 3 
3 



./ — \ / — \ / — \ / — \ 

/ — \ / — \ / — \ / — \ 



/~\ /-\ /~\ /~\ 



/~\ /~\ /-\ /~\ /~\ /-\ /-\ /- 



Note 1 



«=— «« ssssssssssss- 



Note 2 



/ — \ / — \ / — \ /- 

/ — \ / — \ / — \ 



/~\ /-\ /~\ /~\ /~\ /~\ /- 



'/////• 



Note 3 



* CHIP POWER-UP * 



1 I 
/ I I 
Setup Assert Hold 
10 nS 10 nS 10 nS 
min. min. min. 

* CLOCK RESET SEQUENCE * 



Note 4 



- K_SEC%0SC1_H is the internal master clock produced from either the P%OSC_H and 
P%OSC_L inputs, or the P%OSC_TCl_H and P%OSC_TC2_H inputs. P%OSC_IKST_H 

is used to select the clock source as described in this clock specification. 

- S indicates a static (non-changing) NDAL $ \. 
Timing Notes: 

1. ECL pin inputs P%OSC_H and P%OSCJL must be used to supply clocks to chip prior 

to and during power-up. Inputs P%OSC_TCl_H and P%OSC_TC2_H must be held low in order to 
prevent 1 at ch -up . 

2. Switch to test clocks P%OSC_TC2_H and P%OSC_TC2_H . Start measure out lpat on chip tester. 

3. Clocks restarted to restore internal chip signals prior to clock- reset sequence. 

4. P%ASYMCJRESET_Ii must remain asserted for a minimum of 7 NDAL cycles 
following restart of clocks. 



20.3.4.3 Interrupt, Error, and Test Pin Timing 

P%DISABLE_OUT_L and P%TCK_H are an asynchronous inputs. 
P%TEMP_H is an asynchronous output. 

When P%PP_CMD_H<2K)> selects # 2 on P%PPJDATA_H<11> (see Chapter 19) then the output 
is asynchronous. 

The timing for the interrupt, parallel port, serial port, and boundary scan, and error pins is 
shown in Table 20-15. 
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Table 20-15: Interrupt, Test, and Boundary Scan Pin AC timing specs 



Input Pin Setup Time 1 Hold Time 



P%PWRFL_L 

P%HALT_L 

P%H_EEJR_L 

P%INT_TIM_L 

P%S_ERR_L 


3.0 ns to P%PHI41_INH R 
3.0 ns to P%Pffl41_IN_H R 
3.0 ns to P%Pffl41_IN_H R 
3.0 ns to P%PHI41_IN_H R 
3.0 ns to P%PHI41_IN_H R 


P%Pffl41_IN_H R + 0.0 ns 
P%Pffl41_IN_H R + 0.0 ns 
P%PHI41JNH R + 0.0 ns 
P%PHI41_IN_H R + 0.0 ns 
P9&PHI41 JN_H R + 0.0 ns 


P%ERQ_L<3K)> 


1 phase to P%PHI41JN_H R 


P%Pffl41_IN_H R + 1 phase 


P%TEST_DATA_H 
P%TEST_STROBE_H 


1 phase to P%PHI41_INH R 
1 phase to P%PHI41_IN_H R 


P%PHI41_IN_H R + 1 phase 
P%PHI41JN_H R + 1 phase 


P%PP_CMD_H<2K)> 


1 phase to P%PHI41_IN_H R 


P%Pffl41JN_H R + 1 phase 


P%TDI_H 
P%TMS_H 


3.0 ns to P%TCK 
3.0 ns to P%TCK 


P%TCK F + 3.0 ns 
P%TCK F + 3.0 ns 


Output Pin 


Output Valid 




P%MACHENE_CHECK_H 


P%PHI12_IN_H R + 1 phase 




P9SPPJDATA._H<10:0> 


P%PP_DATAja<ll> R + 3 phases 




P%TDO_H 


P%TCK R + 10.0 ns 




J R means the rising edge of the clock is used; F means the falling edge of the clock is used. 
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20.4 Revision History 



Table 20-16: 


Revision History 




Who 


When 


Description of change 


Rebecca Stamm 


3-Dec-1991 


Revision 1.2, correct power supply numbers, leakage 
numbers, AC conditions of test 


Rebecca Stamm 


9-Oct-1991 


Revision 1.1, update power numbers and AC test 
conditions 


John F. Brown 


30-Aug-1991 


Revision 1.0, first edition released 


John F. Brown 


20-Jun-1991 


Revision 0.1, first edition for review 


Mike Uhler 


13-Feb-1991 


Revision 0.0, add template 
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Appendix A 



Processor Register Definitions 



This appendix contains the SDL (Structure Definition Language) definitions for the NVAX pro- 
cessor registers. These definitions are used by chip verification code, and it is strongly recom- 
mended that software groups use the same definitions to minimize errors in the generating new 
definitions. 

NOTE 

The file shown below is maintained by the NVAX CPU chip design group and is con- 
stantly being updated as changes are made to the design. It is included here simply 
as a means to document processor register definitions used in examples throughout 
this specification. The latest machine-readable version of this file should always be 
obtained from the NVAX CPU chip design group. 



modu le 5PR1 SDEF ; 

' *•+ 

■* Nvax - Specific Processor Register Definitions 

* 

* 

* To convert this file to a macro library, do the following: 
* 

* SDl/LANG0AGE-MACRO/COPyRIGHT/VMS_DEVELOPMENT/LIST PR19DEF 

* LIBRARY/CREATE/MACRO/SQUEEZE PR19DEF PR19DEF 



it- 



constant REVISION equals 30 prefix PR19S; /* Revision number of this file 

/* In the definitions below, registers are annotated with one of the following 

/* symbols: 

/* 

/* RW - The register may be read and written 

/* RO - The register may only be read 

/* WO - The register may only be written 
/* 

/* For RO and WO registers, all bits and fields within the register are also 

/* read-only or write-only. For RW registers, each bit or field within 

/* the register is annotated with one of the following: 



an; reads return a 0 
writes cause state to clear 

which also causes state to clear; writes are ignored 



/* 






/* 


RW - 


The bit /field may be 


/* 


RO - 


The bit/field may be 


/* 


WO - 


The bit/field may be 


/* 


WZ - 


The bit /field may be 


/* 


wc - 


The bit/field may be 


/* 


RC - 


The bit/field may be 
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aggregate PR19DEF union prefix PR19; 



/* Architecturally-defined registers which have different characteristics 
/* on this CPU. 

constant TODR equals %xlB tag S; /* Time Of Year Register (RW) 
constant MCESR equals %x2 6 tag S; /* Machine check error register (WO) 
constant SAVPC equals %x2A tag 5; /* Console saved PC (RO) 



constant SAVPSL equals %x2B tag 5; /* Console saved PSL 
?R19SAV?SL_BITS structure fill prefix SAVPSLS; 
PSL LO bitfield lenath 8 mask; /* Saved PSI bits <7:0> 



(RO) 



KALTCODE bitfield length 6 mask; /* 
constant KALT_HLTPIN equals %x2; 
constant KALT_?WRUF equals %x3; 
constant KALT_IITTSTK equals %x4; 
constant EALX_D0U2LE equals %x5; 
constant KALI_KLTINS equals %x6; 
constant KAL7_ILLVEC equals %x7; 
constant EALT_WCS* : iC equals %x8; 
constant KALT^CrXTI equals %xA; • 
constant HA1T_IE0 equals %s!0; 
ccr.star.t SALI^IZL equals %xll; 
rrr.star.t KAL~_IE2 equals %x!2; 
c:r.star.t KALI_IE2 equals %::13 ; 
rrnstant EALr_rz_?SL_L 



KALr_rE_psL_: 



Halt code containing one of the following values 

/* HALT_L pin asserted 
/* Initial powerup 

/* Interrupt stack not valid 

/* Machine check during exception processing 

/* Halt instruction in kernel mode 

/» Illegal SC3 vector (bits<I: 0>-ll) 

/* vrrs SCS vector (bits<l: C>«10) 



on interrupt stack 
/* ACV/TNV during machine check processing 
/* ACV/'TUV during KSl~. r processing 
/* Machine check during r.a:hir.e check processing 
/•* Machine check during KSirv rrccessinr 
%xlS; /* ?SK2£:2-;>«L*1 during interrupt cr exception 



_L1C equals %xLA; / 
equals %xlr; / 
equals %z!2; 



?SL<2€:2-;>«L1C durir.r int 



crnstant KALr_REI PSI 11 : equals *xLE; 
crnstant EALr_?.EI~PSL~ 
i:r."^.II bitfield length i 

bitfield length 1 mask; 
? 3 IK: bitfield length 16 
iZZB; 



* =Sl<I£:2i>-IlI during 
/* ?S1<26::;>-::: during pi: 
/' ?;:<:-: :2-;>-Li: during ?£I 
equals %xlr; /* ?SL<2£ :24>-lll during PEL 
; /- Invalid SAVPSL if - 1 

/*■ kapek<o> 

/' Saved PSL bits <SL:L6> 



cr except!" 
errupt cr exception 



constant IOHZSET . equals %x37 tag 5; /* I/O system reset register (WO) 



constant PME equals %x3D tag S; 



/* Performance monitoring enable (RW) 

(RO) 



constant SIL equals %x3E tag £; /* System identification register 
P?-I9SID_BITS structure fill prefix SID$; 
UC0DE_REV bitfield length 8*mask; /* Microcode (chip) revision 
NONSTAJ©ARD_PATCE bitfield length 1 mask; /* PCS loaded with a non-standard patch 
?ATCH_REV bitfield length 5 mask; /* Patch revision number 
FIL1_1 bitfield length 10 fill tag $S; 

TYPE bitfield length 8 mask; /* CPU type code (19 decimal for NVAX) 
end PR19SID BITS; 
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/* System- level required registers. 

/* These registers are for testability and diagnostics use only. 

/* They should not be referenced in normal operation. 

constant IAK14 equals %x40 tag 5; /* Level 14 interrupt acknowledge (RO) 

consrant 1AK15 equals %x41 tag S; /* Level 15 interrupt acknowledge (RO) 

constant IAK16 equals %x42 tag S; /* Level 16 interrupt acknowledge (RO) 

constant IAX17 equals %x43 tag 5; /* Level 17 interrupt acknowledge (RO) 

?R.1SIAK_VECT0R structure fill prefix XAKS; /* Vector returned in response to lAKlx read 
IPL17 bitfield length 1 mask; /* Force IPL 17, independent of actual level 
PP. bitfield length 1 mask; /* Passive release 

SCSJOTTSZ? bitfield length 14 mask; /* LW offset in SC3 of interrupt vector 
TZZZ_1 bitfield length 16 fill tag SS; 
end ??.1SIAK_VS::?6r; 

constant CKB ecruals %x44 tag 5; /* Clear write buffers (RV?) 
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/* Ebox registers. 

/* Ebox register definition 

constant INTSYS equals %x7A tag S; /* Interrupt system status register (RW) 
PRieiNTSYS_BITS structure fill prefix INTSYSS; 
ICCS6 bitfield length 1 mask; /* ICCS<6> (RO) 
SISR bitfield length 15 mask; /* SISR<15:1> (RO) 

INT_ID bitfield length 5 mask; /* ID of highest pending interrupt (RO) 
constant INT_ID_HALT equals %xlF; /* Halt pin 
constant INT_ID_PWRFL equals %xlS; /* Power fail 
constant INT_ID_K_ERR equals %xlD; /* Hard error 
constant INT_ID_INT_TIM equals %xlC; /* Interval timer 

/* Performance monitor 



constant INT ID ecruals %xl£; 

constar.t IK?_ID_S_ERR equals %xIA; 
constant INT_IP_XRQ3 equals %x!7; 
constant IK?_ID_IRQ2 equals %xl€; 
constant IK?_IE_IRQ1 equals %x!5; 
constant IKT_I2_IRQ0 equals %x!4; 
constant HCI_Ii:_EIERI5 equals %xCF; 
constant IKI_IE_EI£R14 equals %xCE; 
=or.star.t Ii:r_~_SI£?JL3 equals %xC2; 
constant Zl'Z_ZZ_SZSrS.2 equals taCC; 
constant Zi:z_ZZ_SZSrJLl equals %x"5; 
constant i::r_~_EIE?.10 equals %x:-A; 
ccnstant ":"_!£_£ IEP.S equals hx'i 
ccr.stant II~_ir_£I£Rc equals %xD£ 
crr.star.t Zl'Z_ZZ_SZSr.~ equals : " 
= =r.star.t i::r_~_£I£=.€ equals Hzlz 
constant i:"_~_£I£r.c equals ksOt 
crnstar.t I"r_"_EIER- equals hzli 
crnstar.t Zl"_Z2_SZS?3 equals ted- 3 
crnstar.t Zl"_ZZ_£ZS?2 equals feCl 
constant i:~_~_£I£?^. equals %xCl 
crnstar.t r::r~ri ::c Zt:Z equals 
FIL1_1 ritfield length's fill tag £5; 
Z'.~ ZZK rS.BZZ bitfield length 1 mask; 
ml_2 bitfield length 2 fill tag S$; 



/* Soft error 
/* IPI 17 device interrupt 
/» IP", 16 device interrupt 
/* IP1 15 device interrupt 
/' ZsZ. 14 device interrupt 



/' £I£R<15> 

/* £I£R<14> 

/' £I£R<13> 

/- £I£R< 12 > 




/• £!£?.<!> 



Interval timer interrupt reset (WC) 



£_SSR_SES2T bitfield length 1 
?K0::_RE£EI bitfield length 1 s 
H_£?=._?Z££r bitfield length 1 
PWRFL_RES£T bitfield length 1 mask; 
KALT_RESST bitfield length 1 mask; 
end PR1SINTSYE BITE; 



/* Soft error interrupt reset (WC) 
/* Performance monitoring interrupt reset 
/* Hard error interrupt reset (WC) 
/* Power fail interrupt reset (WC) 
/* Halt pin interrupt reset (WC) 



(WC) 
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/* Ebox registers, continued. 

constant PMFCNT equals %x7B tag $; /* Performance monitoring facility count register (RW) 
PR19PMFCNT BITS structure fill prefix PMFCNTS; 

PMCTRO bitfield length 16 mask; /* PMCTRO word 

PMCTR1 bitfield length 16 mask; /* PMCTR1 word 
end PR19PMFCNT_BITS; 

constant PCSCR equals %x7C tag 5; /* Patchable control store control register (RW) 
PR19PCSCR_BITS structure fill prefix PCSCRS; 
FZLL_1 bitfield length 6 fill tag SS; 
?AR_?ORT~DIS bitfield length 1 mask; /* Disable parallel port control of scan chain (RW) 
PCS_EN£ bitfield length 1 mask; /* Enable use of patchable control store (RW) 
PCS_WRITE bitfield length 1 mask; /* Write scan chair, to patchable control store (WO) 
RKL_SEIFT bitfield length 1 mask; /+ Shift read-write latch scan chain by one bit (WO) 
DATA bitfield length 1 mask; /* Data to be shifted into the PCS scan chain (RW) 
F~LL_2 bitfield length 10 fill tag $S; 

K0::SI«KDARE__PATCE bitfield length 1 mask; /* PCS loaded with a non-standard patch (RW; 
?ATCK_RSV bitfield length 5 mask; /* Patch revision number (RW) 
F--l_2 bitfield length 3 fill tag SS; 
end ?R.19?CSC?._B~TS; 

constant SCR equals %x"D tag 5; /* Sbcx control register (RW) 

?R19EC?._£irs structure fill prefix ECRS; 
*."SCrOR_?5SSS!~ bitfield length 1 mask; /* Vector unit present (RW) 
FBCX_E::a=:.F. bitfield length" 1 mask; i- Fbcx enabled {RW! 

rn-SCrr_E>:" bitfield length 1 mask; /» Select external timebase fcr S3 sts 
~B"X_S~i_r iPASS_Ei:ABlE bitfield length 1 mask; /* Fbcx stage 4 conditional bypass enable 
ri:-2C~_CCCr5SZr bitfield length 1 mask; /•* S3 stall timeout occurred (WC; 
T~-3CT7_TESr bitfield length 1 -ask; /' Select test mode fcr S3 stall timeout (RW; 
"MEC"r_ClCCF. bitfield length 1 mask; /- Clock S3 timeout (RW) 

1CCS_S>~ bitfield length 1 mask; Full ICCS implemented in external logic (RW) 

-:il_l bitfield length 5 fill tag SS; 

rsc:':_7SST_S::AsZS bitfield length 1 mask; /■* Enable test of Fbcx (RW) 
FIL1_1 bitfield length Z fill'tag SS; 

?>2"_Si:«2LS bitfield length 1 mask; f Performance monitoring facility enable 

?:-:F_?:c~': bitfield length Z mask; /* Performance monitoring fa; 
constant ?KCX_I30X equals %b00; /» Select Ibox 
constant PMUX_EBOX equals %b01; /* Select Sbox 
constant PMUXJSCX equais %bl0; /* Select Mbox 
constant ?MUX_C30X equais %bll; /* Select Cbox 

PMF_£KOX bitfield length *3 mask; /* Performance monitoring facility Ebox mux select (RW) 
constant EMUX_S3_STALL equals %b000; /* Measure S3 stall against total cycles 
constant EMUX_EM_PA_STA1X equals %b001; /* Measure EM+PA queue stall against total cycles 
constant EMUX_CP2 equals %b010; /* Measure instructions retired against total cycles 
constant EMUX_STAXL equals %b011; /* Measure total stalls against total cycles 
constant EMUX_S3_STALL_PCT equals %bl00; /* Measure S3 stall against total stalls 
constant EMUX_EK_PA_STALL_PCT equals %bl01; /* Measure EM+PA queue stall against total stalls 
constant EMUX_UWORD equals %blll; /* Count microword increments 

PMF_IFSR bitfield length 1 mask; /* Performance monitoring facility Wbus If SR enable (RW) 

FILL_3 bitfield length 8 fill tag SS; 

PMF_CLEAR bitfield length 1 mask; /* Clear performance monitoring hardware counters (WO) 
end PR19ECR BITS; 
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/* Mbox TB registers. 

/* These registers are for testability and diagnostics use only. 
/* They should not be referenced in normal operation. 

constant MTBTAG equals %x7E tag 5; /* Mbox TB tag fill (WO) 
PP„19MTBTAGJ3lfs structure fill prefix MTBTAGS; 
TP bitfield length 1 mask; /* Tag parity bit 
FILL_1 bitfield length 8 fill tag 55; 

VPN bitfield length 23 mash; /* Virtual page number of address (VA<31:9>) 
end PR1 9MTBTAG_E ITS ; 

constant MTBFTE equals %x7F tag S; /* Mbox TB PTE fill (WO) 

PPvl9MTB?T£_BITS structure fill prefix MTBPTE? ; /* Format is normal PTE format, except for PTE parity bit 
PFN bitfield length 23 mask; /* Page frame number (PA<31:9>) 
FILL_1 bitfield length 1 fill tag $$; 
P bitfield length 1 mask; /* PTE parity 
FILI_2 bitfield length 1 fill tag S$; 
K bitfield length 1 mask; /» Modify bit 
PRO™ bitfield length 2 mask; /* Protection field 
V bitfield length 1 mask; /* PTE valid bit 

end PP.19MT3PTE BITS; 
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/* Vector architecture registers 

constant VP SR. equals %x90 tag $; /* Vector processor status register (RW) 
PR19VPSR_BITS structure fill prefix VPSR5; 
VEN bitfield length 1 mask; /* Vector processor enabled (RW) 
RST bitfield length 1 mask; /* Vector processor state reset (WO) 
FILL_1 bitfield length 5 fill tag $$; 

AEX bitfield length 1 mask; /* Vector arithmetic exception (WC) 
FILL_2 bitfield length 16 fill tag $S; 

IK? bitfield length 1 mask; /* Implementation-specific hardware error (WC) 
FILL_3 bitfield length 6 fill tag $$; 

BSY bitfield length 1 mask; /* Vector processor busy (RO) 

end ?R1 9 VP SR_E In- 
constant YRZR equals %x91 tag S; /* Vector arithmetic exception register (RO) 

?R19VAER_BI?S structure fill prefix VA5RS; 
T_pl\DT bitfield length 1 mask; /* Floating underflow 
F_piYZ bitfield length 1 mask; /* Floating divide-by-zerc 
F_R~?R bitfield length 1 cask; /* Floating reserved operand 
F_0VF1 bitfield length 1 mask; /•*• Floating overflow 
7Z-ZJL bitfield length 1 fill tag SB; 
I_"TL. bitfield length 1 mask; /* Integer overflow 
~Z1,Z._2 bitfield length 10 fill tag SS; 

~XZZS~Zr._l-^£?. bitfield length 16 mask; /* Vector destination register mask 
er.d ??.15 - .«S?._E~S; 

rcnstar.t ~.1^.Z eruals %xr2 tag S; /*■ Yertrr memory activity register <;?w) 
"r.star.t ".TEIA equals %i:r3 tag S ; /» Vertcr translation buffer invalidate all 
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/* Cbox registers. 

constant CCTL equals %xA0 tag S; /* Cbox control register (KW) 
PR19CCTI_BITS structure fill prefix CCTL5; 

ENABLE bitfield length 1 mask; /* Enable Bcache (RW) 
TAG_SPEED bitfield length 1 mask; /* Tag RAM speed (RW) 

constant TAG_3_CYCL£S equals 0; /* Select tag RAM speed: 3-cycle read rep/3-cycle write rep 

constant TAG 4 CYCLES equals 1; /* Select tag RAM speed: 4-cycle read rep/4-cycle write rep 
DATA_SPEED bitfield length 2 mask; /* Data RAM speed (RW) 

constant DATA_2_CYCLES equals 0; /* Select data RAM speed: 2-cycle read rep/3-cycle write rep 

constant DATA_3_CYCLES equals 1; /* Select data RAM speed: 3-cycle read rep/4-cycie write rep 

constant DATA_4_CYCLES equals 2; /* Select data RAM speed: 4 -cycle read rep/5-cycie write rep 
SIZE bitfield length 2 mask; /* Bcache size (RW) 

constant SIZE_12SK5 equals 0; /* Select 128KB Bcache 

constant SIZE_256KB equals 1; /* Select 256KB Bcache 

constant SDSE_S12KB equals 2; /* Select 512KB Bcache 

constant SDZE_2HB equals 3; /* Select 2MB Bcache 
F0RrE_K" bitfield length 1 mask; /* Force Bcache hit (RW) 
D:sASLE_SRR0R= bitfield length 1 mask; /* Disable Bcache SCC errors (RW) 
S>:_ECC bitfield length 1 mask; /■* Enable use of software ECC (RW) 

DDl-:EC~D_DESr bitfield length 1 mask; /* Enable test cf Cbox read timeout counters (RW) 
D:SAFDE~?ACF. bitfield length 1 mask; /' Disable write packing (RW) 
?K_ACCES5_TY?E bitfield length 3 mask; /* Performance monitoring access type (RWi 

oor.s-ar.c ?KA~_C0K equals 0; /'* Coherency access cf either type 

constant ?KAr_COK_READ equals I; /» Coherency access fcr READ 

constant ?KAD_COK_0READ equals 2? /'* Cchersr.cy access for "READ 

ccnstant ?IC-.D_CPD equals 4; /* CPU access cf any type 

ccnstant ?M--D_CFU_2READ equals c; /' CPU access fcr DREAD 

ccnstant F:CAr_CPU_D?.EAD equals £; /* CPU access fcr DREAD 

constant ?1CAU_CPU_CREAD equals "/; /' CPU access fcr OREAD 
=:-:_K"_rVPE bitfield length 2 mask; /• Performance r.cr.itcring hit type (RK; 

ccnstant ?KKD_KDD equals 0; /' Kit 

ccnstant Pi-3r_HDD_C>~rED equals 1; /" Hit on owned block 
ccnstant ?KKr_KD~_VALDD equals 2; /' Kit cn valid block 

ccnstant p:-D-:r_l-:DSS_c;C3D equals 3; /* 1-^.ss cn owned block (causes writeback'' 
FO?.CE__!wAD._?ERR. bitfield length 1 mask; /» Forces 1 parity error on the KDAL, cn next outgoing trar.sacti: 
FDLD_1 bitfield length 13 fill tag 55; 

SW_ETM bitfield length 1 mask; /* Enter software error transition mode (RW) 
HK EDM bitfield length 1 mask; /* Error transition mode entered due to error (WC) 
"end ??.19CCTL_ErTS; 

constant BCDECC equals %xA2 tag 5; /* Bcache data ram ECC (WO) 
?S.19BCDECC_BITS structure fill prefix BCDECC? ; 
FILL_1 bitfield length 6 fill tag 55; 
ECCLC bitfield length 4 mask; /* ECC check bits <3:0> 

FILL_2 bitfield length 12 fill tag 55; 
ECCHI bitfield length 4 mask; /* ECC check bits <7:4> 
FILL_3 bitfield length 6 fill tag 55; 
end PR19BCDECC BITS; 
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/* Cbox registers, continued 

constant BCETSTS equals %xA3 tag $; /* B cache error tag status (RW) 
PR1SBCETSTS_BITS structure fill prefix BCETSTSS; 
LOCK bitfield length 1 mask; /* Tag store registers are locked due to an error (WC) 
CORR bitfield length 1 mask; /* Correctable error occurred (WC) 
UNCORR bitfield length 1 mask; /* Uncorrectable error occurred (WC) 

BAD_ADDR bitfield length 1 mask; /* Addressing error occurred (WC) 

LOST_ERR bitfield length 1 mask; /* Error occured while register was locked (WC) 

TS_CMD bitfield length 5 mask; /* Tag store command which caused error (RO) 

constant CMD_DRSAD equals %b0 0111; /* Command was D-stream tag lookup 

constant CMD_IREAD equals %b00011; /* Command was I-streair tag lookup 

constant CMD_OREAD equals %b00010; /* Command was OREAD tag lookup for write or read lock 

constant CMD_WUKLCCK equals %b01000; /* Command was write unlock tag lookup (done only under ETM) 
constant CKD_R_INVAL equals %b01101; /* Command was inval tag lookup for IIDA1 DF.EAD or IREAD 
constant CKD_C_INVAL equals %b010Cl; /* Command was inval tag lookup for KDAI OREAD or WRITS 
constant CKD_ISR_DS ALLOC equals %b01C10; /* Command was tag lookup for IrR deallocate 
"ILL 1 bitfield length 22 fill tag S$; 
end ?R19BCSISIS_5IT£; 

constant BCSIIDX equals %xA< tag 5; /* Bcache error tag index (RO) 

constant SCSI AS equals %xA5 tag S; /* Bcache error tag (RO) 

?II1_1 bitfield length i fill tag 55; 

ICO iitfielc length 6 mask; f SIC bits 
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/* Cbox registers, continued 

constant BCEDSTS equals %xA6 tag 5; /* B cache error data status (RW) 
PR19BCEDSTS_BIT£ structure fill prefix BCEDSTS5; 
LOCK bitfield length 1 mask; /* Data RAM registers are locked due to an error (WC) 
CORR bitfield length 1 mask; /* Correctable ECC error occurred (WC) 
UNCORR bitfield length 1 mask; /* Uncorrectable ECC error occurred (WC) 
BAD_ADDR bitfield length 1 mask; /* Addressing error occurred (WC) 
LOST_ERR bitfield length 1 mask; /* Error occurred while register was locked (WC) 
FILL_1 bitfield length 3 fill tag 55; 

DR_CMD bitfield length 4 mask; /* Data RAM command which caused error (RO) 
constant CMD_DRSAD equals %b011i; /* Command was D-stream data lookup 
constant CKD__I?«SAD equals %b0011; /* Command was 2-stream data lookup 
constant CKD_W5ACK equals %b0100; /* Command was writeback data lookup 
constant CKD_RMK equals %bOG10; /* Command was read-modify-write data lookup 

FIli_2 bitfield length '20 fill tag 55; 
end PR19BCSDSTS_SIT£; 

constant equals %xA" tag 5; /* Bcache error data index (RD) 

ECC (?.0 ) 



■ndrome bits <3:0> 



T*drc:r=* tits <":4> 
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/* Cbox registers, continued 

constant CEFADR equals %xAB tag S; /* Fill error address (RO) 

constant CEFSTS equals %xAC tag S; /* Fill error status (RW) 
PR19CEFSTS_BITS structure fill prefix CEFSTS?; 

RD1K bitfield length 1 mask; /* Error occurred during a read lock (WC) 
LOCK bitfield length 1 mask; /* CEFSTS 6 CEFADR registers are locked due to an error (WC) 
TIMEOUT bitfield length 1 mask; /* Fill failed due to transaction timeout (WC) 
RDE bitfield length 1 mask; /* Fill failed due to Read Data Error (WC) 
L0ST_ERR bitfield length 1 mask; /* Error occurred while register was locked (WC) 
IDO bitfield length 1 mask; /* NDA1 id<0> for failed read (RO) 
I READ bitfield length 1 mask; /* Error occured during an IREAD (RO) 
OPZAD bitfield length 1 mask; /* Error occurred during an OREAD (RO) 
WRITE bitfield length 1 mask; /* Error occurred during a write (RO) 
TC_l-30>: bitfield length 1 mask; /* Data was destined for the Mbox (?>0) 

bitfield length 1 mask; /* READ invalidate was pending (RO) 
CI? bitfield length 1 mask; /* OREAD invalidate was pending (RO) 

Z'.ZB bitfield length 1 mask; /* Data was not to be validates when fill completed (RO) 
RDLK_FL_DC13E bitfield length 1 mask; /* Last fill fcr read lock received (RO) 
?E2_FILI._DC:rE bitfield length 1 mask; /* Requested fill quadwrrd was received for this read. 
CC~"T bitfield length 2 mask; /* dumber cf requested cf fill received (RO; 

**:n:'^ECTE~_F~I. bitfield length 1 mask; /'* ST-E zz RTR was received frcr. the :~AL when fill_ram net valid (KC) 
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/* Cbox registers, continued 

constant NESTS equals %xAE tag $; /* NDAL error status (RW) 
PR19NESTS_BITS structure fill prefix NESTSS; 
NOACK bitfield length 1 mask; /* Outgoing command was NACKed (WC) 
BADWDATA bitfield length 1 mask; /* BAD WD ATA cycle transmitted (WC) 

LOST_OERR bitfield length 1 mask; /* Outgoing error was lost while register was locked (WC) 
PERR bitfield length 1 mask; /* NDAL parity error detected (WC) 

INCON_PERR bitfield length 1 mask; /* Inconsistent parity error (parity error detected on 
L0ST_PERR bitfield length 1 mask; /* NDAL parity error detected while register was locked (WC) 

/* ACKed transaction) (WC) 
FILL 1 bitfield length 26 fill tag $$; 
end PR19NESTS_SITS; 

constant NEOADR equals %x30 tag $; /* NDAL error output address (RO) 

constant NEOCKD equals %xB2 tag S; /* NDAL errcr output command (PvO) 
F?.1?NE0C1C_3It"s structure fill prefix KECCKDS; 
CKD bitfield length 4 mask; f NDAL command on outgoing error transaction (see below) 
ID bitfield length 2 mask; /* NDAL ID on outgoing error transaction 
F2LL_1 bitfield length 1 fill tag SS; 

sYZZ.Jl.1~ bitfield length S mask; f Byte enables cn outgoing err:r transaction 
~22,L~2 bitfield length 14 fill tag SS; 

LEI\ bitfield length 2 mask; /- Length cn outgoing error transaction (see below) 
end ?Rienicr:C_B2rs; 

constant 1CE2A2K2 equals %x54 tar 5; /* :~A2 err=r data high •;?": 

constant 1<E2A22,2 equals %xB5 tag S; :~AL err:r data Lev 

constant KE2CIO equals %x5S tag £; /' :~AL error input corn. and ! J.Z: 

P?.I=:^LCIC_=~S structure fill prefix -^ZZ~~B ; 
CSC bitfield length 4 mask; /* IC-AL ccrutand received cn error transaction (see oelcw; 
22 bitfield length 2 -ask; /~ :~AL 22 rereirec errrr z- transaction 
PAR22Y bitfield length 2 nask; f :C2AL parity bits received errrr :r. transaction 
~2LL_1 bitfield length 22 fill tag 35; 

end rftlSISiaC 5~2; 



A-12 Processor Register Definitions 



DIGITAL CONFIDENTIAL 



NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 



/* Cbox registers, continued 
/* Encoded NDAL length values 

constant L£N_HW equals %b00 prefix NDALS; /* Length - hexaword 
constant LEN_QW equals %bl0 prefix NDALS; /* Length - quadword 
constant LEN_OW equals %bll prefix NDALS; /* Length - octaword 

/* encoded NDAL command values 

constant CMD_N0? equals %b0000 prefix NDALS; /* Command - NOP 
constant CMD_WRITE equals %b0010 prefix NDALS ; /" Command - Virite 
constant CMD_KDISOWK equals %b0011 prefix NDALS; /* Command - Write disown 
constant 02D_IR£AD equals %b0100 prefix NDALS; /* Command - I-read 
constant CMD_DREAD equals %b0101 prefix NDALS ; /» Command » D-read 
constant CHD_ORSAD equals %b0110 prefix KDALS; /* Command « 0- read 
constant CKC_R2E equals %b!001 prefix NDALS; /* Command » Read data error 
constant CKD_TOATA equals %o!010 prefix KDALS ; /' Command - Write data 
constant C1C:_SADKDATA equals %blCll prefix IwALS; !"* Command « Bad write data 
constant CXD__?3RC equals %bllD0 prefix KDALS; /* Command - Read data return C 
constant CKD_RDR1 equals %b!101 prefix KDALS; /»• Command - Read data return 1 
constant Clw_?3RI equals %blllC .prefix KDALS; .<*■ Command - Read data return 2 
constant CKD_52R3 equals %rllll prefix KDALS; /- Command - P^ead data return 3 
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/* Cbox registers, continued 

constant BCTAG equals %x01000000 tag 5; /* First of 64K Bcache tag IPRs (RW) 
constant BCTAG_128KB_MAX equals %x0101FFEO tag S; /* Last tag XPR for 128KB Bcache 
constant BCTAG_256KB_MAX equals %x0103FFE0 tag S; /* Last tag IPR for 256KB Bcache 
constant BCTAG_512KB_MAX equals %x0107FFE0 tag S; /* Last tag IPR for 512KB Bcache 
constant BCTAG_2MB_MAX equals %x011FFFE0 tag S; /* Last tag IPR for 2MB Bcache 

constant IPR_INCR equals %x20 prefix BCTAG S; /* Increment between Bcache tag IPR numbers 
PR19BCTAG_BIT£ structure fill prefix BCTAG? ; 
FILL_: bitfield length 9 fill tag SS; 
VALID bitfield length 1 mash; /* Valid bit (RW) 
OWNED bitfield length 1 mask; /* Ownership bit (RW) 
ECC bitfield length € mask; /* ECC bits (RW) 
TAG bitfield length 15 mask; /* tag data (RW) 
end FR.19BCTAG_BITS; 

constant BCFLUSH equals %x01400000 tag 5; /* First cf 64K Bcache tag deallocate IPRs (WO) 
constant BCFLUSK_128KB_KAX equals %zC141FFSC tag 5; /*■ Last deallocate IPR for 128KB Bcache 
constant BCntTSK_256ES_2£AX equals %x0143FFEC tag ?; /* Last deallocate IPR for 256KB Bcache 
constant BCFLC£K_E12KB_KAX equals %xC14 r FFEC tag 5; .-'* Last deallocate IPR for 512KB Bcache 
constant 3CFLUSE_2M5__KAX equals %xC'15FF~EC tag S; /* Last deallocate IPR for 2MB Bcache 

constant I?R_i:*~R. equals %x2D prefix BCFLjSKS; Increment between Bcache deallocate IPR numbers 
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/* Ibox registers. 

constant VMAR equals %xD0 tag S; /* VIC memory address register 
PR19VMAR_BITS structure fill prefix VMAR$; 
FILL_1 bitfield length 2 fill tag SS; 

LW bitfield length 1 mask; /* longword within quadword 
SUB_BLOCK bitfield length 2 mask; /* sub-block indicator 
ROVf_INDEX bitfield length 6 mask; /* cache row index 
ADDR bitfield length 21 mask; /* error address 
end PR19VMAR_BITS; 

constant VTA3 equals %xDl tag £; /* VIC tag register 

?R19VTAS_EITS structure fill prefix VTAsi ; 
V bitfield length 4 mask; /* data valid bits 
DP bitfield length 4 mask; /* data parity bits 
TP bitfield length 1 mask; /* tag parity bit 
FIL1_1 bitfield length 2 fill tag SS; /« unused bits (zero) 
Z?.S bitfield length 21 mask; /- tag 

end PP.19VT A3_B"S; 

constant VSATA equals %xi;2 tag S; /* VIC data register 

constant ICSR equals %:05 tar S; ./* Ibex centre 1 and status register {RTC; 
??.152CS?._="S structure fill prefix ICS?.;; 
E:z«3Ii bitfield length 1 mask; ."■ enable bit <?Ja) 

F™_1 bitfield length 1 fill tag SS; 

Z.ZZT. bitfield length 1 -as-:; '» P.erister is lerked due tc an errcr fiCC: 
Ir£?5. bitfield length 1 r&sk; - ~ lata parity errer •;?.?; 

bitfield length 1 rssk; * lag rarity errer 
F~L_r bitfield length 2" fill tag SS; 
er.d =•!::=?._= ITS; 

constant HPCR equals %xI4 tag S; /» lb ex branch prediction centre! register 
P?.lr£5r?_H"S structure fill prefix =?C?.S; 
HISTORY bitfield length 4 mask; " branch history- bits 
FT111 bitfield length 1 fill tar SS; 

2-:i=F?.Er"T bitfield length 1 -ask; history of last branch 

~LUS:-:_5ET bitfield length 1 mask; /'* flush branch history table 
FI,USK_CTR bitfield length 1 mask; /» flush branch hist addr counter 
LOAD_KISTO?.Y bitfield length 1 mask; /* write new history to array 
F~LL_1 bitfield length 7 fill tag SS; /» unused bits (must be zero) 
B?T_AI.305Z THK bitfield length it mask; /> branch prediction algorithm 

constant BPU_ALGORITHm' equals %xr£CA; /* default value for B?U_AXGORITHM field 
end PR19SPCR_BITS; 

/* The following two registers are for testability and diagnostics use only. 
/* They should not be referenced in normal operation. 

constant BPC equals %xD6 tag S; /* Ibox Backup PC (RO) 

constant BPCUNK equals %xD7 tag $; /* Ibox Backup PC with .RLOG unwind (RO) 
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/* Mbox internal memory management registers. 

/* These registers are for testability and diagnostics use only. 

/* In normal operation, the equivalent architecturally-defined registers 

/* should be used instead. 



constant MPOBR equals %x£0 tag $; 
constant MP01R equals %xEl tag S; 
constant MP1BR equals %x£2 teg S; 
constant MF11R equals %xZ3 tag 5; 
constant MS3R equals %xS4 tag S; 
constant MSLR equals %xE5 tag $; 
cons- ant MMAPSK equals %xE6 tag $; 



/* Mbox P0 base register (RW) 
/* Mbox P0 length register (RW) 
/* Mbox PI base register (RW) 
/* Mbox PI length register (RW) 
/* Mbox system base register (RW) 
/* Mbox system length register (RW) 
/» Mbox memcrv manacement enable (RW) 
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/* Mbox registers. 

constant PAMODE equals %xE7 tag S; /* Mbox physical address mode (RK) 

PR19PAMODE_BITS structure fill prefix PAMODES; 
MODE bitfield length 1 mask; /* Addressing mode(l - 32bit addressing) (RW) 
constant PA_30 eguals 0; /* 30-bit PA mode 
constant PA_32 equals 1; /* 32-bit PA mode 
FILL_1 bitfield length 31 fill tag SS; 
end PR1SPAM0D£_BXTS; 

constant MMSADR eguals %xE8 tag S; /* Mbox memory management fault address (RO) 

constant MMEPTE equals %xES tag S; /* Mbox memory management fault PTE address (RO) 

constant MMEETE equals %xSA tag S; /■* Mbox memory management fault status (RO) 
?R.1?MMES™S_SXTS structure fill prefix MMESTSS; 
LV bitfield length 1 mask; /* ACV fault due to length violation 
?-E_REF bitfield length 1 mask; /* ACV/TNV fault occurred on PFTE reference 
K bitfield length 1 zzs.sk; /» Reference had write cr modify intent 
TZZZJi bitfield length 11 fill tag SS; 

FA™t bitfield length 2 mask; y Fault type, one cf the following: 
constant FA~7_ACY equals 1; /" ACV fault 



SRC bitfield length 3 riask; .- - Shadow eery cf ~CCK bits (see KSRCS constants below; 
ZZZT. bitfield length 2 -ask; " Lock status (see KSR.C5 rcr.star.t below; 
er.c rz.zy.Z^SZ=^_~ZZZ; 

constant TB.-.CR equals ~zl.Z tar £; /*• :-:hoz IB parity error address (?.C; 

constant ZzSZB equals %::£1 tar B; .• ■* li:r. r= parity error status {?.?:; 

??.15-=£r=_Brf= structure fill prefix ZBBZBB; 
ZZZT. bitfield length 1 -as!;; Register is locked due to an error (TCC: 

2-E=E bitfield length 1 r,ask; /- 2,ata rarity error tRC; 
~F-?=. bitfield length 1 nask; ."- tar rarity error 

EK_VA1 bitfield length 1 mask; f E>: latch was valid when error occurred (R.0) 
03 bitfield length o cask; i~ So cc=*r.d when Zs rarity error occured (RO) 
FILL_1 bitfield length 2 0 fill tag SS; 

SRC bitfield length 3 mask; /» Source of original refemce (see MSRCS constants bei 
end ?R19CBSTS_51TS; 

constant IREF_IATCK equals 6 prefix MSRCS; /* Source of fault was IREF latch 
constant SPEC_QcST3E equals 4 prefix MSRCS; /* Source of fault was spec queue 
constant EM_LA?CK equals 0 prefix MSRCS; /* Source of fault was EM latch 
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/* Mbox Pcache registers 

constant PCADR equals %xF2 tag 5; /* Mbox Pcache parity error address (RO) 

constant POSTS equals %xF4 tag S; /* Mbox Pcache parity error status (RW) 
PR19PCSTS_BXTS structure fill prefix PCSTS5; 
LOCK bitfield length 1 mask; /* Register is locked due to an error (WC) 
DPERR bitfield length 1 mask; /* Data parity error occurred (RO) 
RIGHT_BANK bitfield length 1 mask; /* Right bank tag parity error occurred (RO) 
LEFT_BANK bitfield length 1 mask; /* Left bank tag parity error occurred (RO) 
CSSD bitfield length 5 mask; /* S6 command when Pcache parity error occured (RO) 
?TE_ER_WR bitfield length 1 mask; /* Hard error on PTE DREAD occurred (orig- ■**£ was WRITE) (WC) 
?TE_ER~bitfieid length 1 mask; /* Hard error on PTE DREAD occurred (WC) 
F2LL_2 bitfield length 21 fill tag £S; 
end ?R.19?C£T£_S2T£; 

constant PCZZZ equals %xF6 tag £; /* Mbox Pcache control (RW) 
?R19?CCTL_3ITS structure fill prefix PCCTL5; 
2_EKA5LE bitfield length 1 mask; /'* Enable for invalidate, D-stream read/write/fill (RW) 
2_SKABLE bitfield length 1 mask; / T Enable for invalidate, I-stream read/; 
~Or.Zz._y.ZZ bitfield length 1 mask; /* Enable force hit on Pcache references 
=a:~_3E1 bitfielc length 1 mask; /* Select left bank if 0, right bank if 1 
?_E::A=LE bitfield length 1 -mask; '/* Enable parity checking (RW) 
?:c< .hitf ield length 3 mask; I- Ktox performance mcr-itcr mode (RW; 
Z1~.Z_Z TSAHLE bitfield length 1 mask; *'+ Pcache electrical disable hit (RW; 
?-X2_E"A=LE ritf ield length 1 -ask; /* Redundancy enable bit (RC; -/ 

en= ??.iipcr~_=rr=; 

=:r.star.t P equals *x:i=::o:: tag £; /' First cf 256 Pcache tag TPRs (Rvr.; 
crnstar.t P"A3_::a:: equals %x:1S:1FE: tag £; " Last =f 25? Pcache tag IPRs 

crnstar.t 2P?._I"2r. equals %xi: prefix P "A3£ ; /* Increment between Pcache tag IP P. numbers 
??.lfP"A3_=rX£ structure fill prefix PCTAS:; 
A bitfield length 1 mask; ■ ~ Allccaticr. hit ccrrestrndinc tc index ;f this tag (RW: 
V bitfield length 4 mask; Valid hits ccrresr ending tc the 4 data subblecks (RW) 

P bitfield length 1 r.ask; f Tag parity (RW: 
r2LL_l bitfield length ? fill tag S£; 
"A3 bitfield length 12 mask; /- Tag bits (RW) 
end ??.IS?CTA-s2sZT£; 

constant PCDA? equals %x01C0OOOO tag S; /* First cf 1024 Pcache data parity IPRs (RW) 
constant ?C2AP_:-3u: equals %x01CC1FFS tag £; /' Last cf 1C24 Pcache data parity IPRs 

constant IPR_INCR equals %x8 prefix PCDAPS; /* Increment between Pcache data parity IPR numbers 
PR19PCDAP_BITS structure fill prefix PCDAPS; 
DAT A_? ART TV bitfield length 8 mask; /* Even byte parity for the addressed quadword (RW) 
FILL_1 bitfield length 24 fill tag SS; 
end PR1SPCDAP_BITS; 

end PR19DEF; 

end module SPR19DEF; 
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Addressing Modes 

General Register • 2-1 0 
PC-Relative • 2-11 

Address Translation • 1 2-79 
PO Process Space • 2-28 
P1 Process Space • 2-29 
System Space • 2-26 

ALU -8-1 8 

ASTLVL 

See DEC Standard 032 



B 



Backup Cache 

See Bcache 
Bcache 

Addressing 128KB Cache • 13-15 

Addressing 256KB Cache • 13-16 

Addressing 2MB Cache* 13-18 

Addressing 512KB Cache* 13-17 

Data Store ECC Matrix • 13-40 

Disabling* 13-108 

Enabling • 13-108 

Interface Pin Descriptions* 13-10 

IPR Access* 13-56, 13-87 

Organization • 1 3-4 

Pin Timing • 13-6, 20-11 

RAM Speeds* 13-5 

Tag and Index Interpretation • 13-4 

Tag Store ECC Matrix • 13-38 

BCDECC • 13-65 

BCEDECC* 13-75 

BCEDIDX* 13-74 

BCEDSTS • 13-71 

BCETAG • 13-69 

BCETIDX* 13-69 

BCETSTS* 13-66 

BCFLUSH* 13-89 

BCTAG * 13-87 

Boundary Scan Register • 19-16 
BPCR • 7-60 
BPU • 7-55 



BPU (Cont.) 

Branch History Table • 7-56 
Branch Mispredict Processing • 7-58 
Branch Prediction Algorithm • 7-55 
Branch Prediction Sequence • 7-56 
Branch Queue • 7-57 
Branch Stall • 7-58 
PC Loads • 7-58 

Branch Condition Evaluator • 8-34 

Branch Prediction Unit 
See BPU 

Branch Queue • 8-46 

Byte Mask Generation ♦ 1 2-36 
Unaligned References • 1 2-57 



Cache Coherency • 13-99 

CCTL* 13-61 

CEFADR* 13-80 

CEFSTS* 13-76 . 

Chip Clocking 

Clock Domain Crossing • 17-9 
Clock Skew* 17-7 
Controlling Inter-Chip Skew • 17-7 
External Oscillator • 1 7-1 
Generation and Distribution • 17-3 
Global Clock Distribution • 1 7-5 
Global Clock Waveforms • 17-5 
Initialization • 1 7-9 
Inter-Clock Skew • 17-9 
NDAL Clocks* 17-7 
NDAL Signals* 17-9 
Rise and Fall Times • 17-7 
Section Clock Distribution • 17-5 
Section Clock Waveforms • 17-6 
Self Skew* 17-8 
Test Environment • 1 7-2 

Chip Initialization ♦ 1 6-1 
Cache* 16-3 
Cbox* 13-119 
Console* 16-2 
Ebox • 8-84 

Hardware and Microcode • 16—1 
Ibox • 7-64 
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Chip Initialization (Cont.) 

Mbox- 12-120 

Microsequencer • 9-24 
Chip Overview 

Box and Section Description • 4-1 

Major Buses • 4—4 

The Cbox • 4-4 

The Ebox and Microsequencer • 4-3 

The Fbox • 4-3 

The I box • 4-2 

The Mbox • 4-4 
Chip Reset* 17-10 
Clocking 

See Chip Clocking 
Complex Specifier Unit 

See CSU 
Console Halt • 2-40, 1 5-19 

Halt Codes* 15-19 
CPUID*2-44, 18-2, 18-7, 18-11 
CSRD 

See DEC Standard 032 
CSRS 

See DEC Standard 032 
CSTD 

See DEC Standard 032 
CSTS 

See DEC Standard 032 
CSU • 7-40 

Branch Mispredict Effects • 7-52 

ibox IPR Transactions • 7-53 

Microcode Control • 7-40 

Microcode Restrictions • 7-53 

Pipeline • 7-41 

RLOG • 7-51 



D 



Data Types • 2-6 to 2-8 
Destination Queue • 7-32, 8-44 



E 



ECR • 8-81 

Electrical Characteristics 
AC Characteristics • 20-7 
AC Conditions of Test • 20-7 
DC Characteristics • 20-1 
Maximum Ratings • 20-1 
Pin Capacitance • 20-4 
Pin Driver Impedance • 20-3 
Pin Levels • 20-4 

Power Dissipation Across Voltage and Cycle 
Time ♦ 20-2 
Error Handling and Recovery • 1 5-3 
Cache and Memory Errors • 1 5-9 
Cache Coherence • 15-10 
Error Analysis • 1 5-7 



Error Handling and Recovery (Cont.) 

Error Recovery • 15-8 

Retry 15-1 7 

State Collection • 1 5-3 
Errors 

Bcache* 13-111 

Dstream Memory • 7-65 

istream Memory • 7-84 

P cache Parity Error* 12-104 

TB Parity Error* 12-103 
Error Transition Mode 

See ETM 
ESP 

See DEC Standard 032 
ETM -13-104 
Exceptions • 2-35 

Arithmetic • 2-36 

Ebox Handling • 8-€7 to 8-72 

Emulated Instruction • 2-38 

Fbox Detected • 8-56 

Ibox Detected • 8-51 

Machine Check • 2-40 

Memory Management • 2-37 

Reserved Addressing Mode • 7-65 

Reserved Opcode • 7-65 

Vector • 2-40 
Exception Stack Frame 

General • 2-33 

Minimum » 2-33 



F 



Fbox Destination Scoreboard • 8-54 
Fbox Disabled Mode - 8-58 
Fbox Result Handling • 8-53 
Fbox Stage 4 Bypass • 11-63 
Field Queue • 8-48 



G 



GPR • 2-4 



H 



Hard Error Interrupts • 15-49 

Event Descriptions • 15-51 to 15-56 
Parse Tree* 15-50 to 15-51 
Stack Frame • 15-49 



I 



I/O Space Read Synchronization • 8-63, 12-33 
IAK14* 10-3, 10-14 
IAK15* 10-3, 10-14 
IAK16* 10-3, 10-14 
IAK17* 10-3, 10-14 
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Ibox IPR Access • 8-66 
IBU • 7-19 

Branch Displacement Processing • 7-25 

DL Stall • 7-24 

Ebox Assist Processing • 7-25 

Exception and Error Processing • 7-28 

FPD Processing • 7-29 

Index Mode Specifiers • 7-27 

instruction Context • 7-21 

Instruction Parse Completion • 7-28 

Loading New Opcode • 7-27 

Operand Access Types • 7-23 

PC and Delta PC* 7-24 

Quadword Immediate Specifiers • 7-26 

Reserved Addressing Modes • 7—26 

Reserved Opcodes • 7-28 

Specifier Identification • 7-21 

SPEC_CTRL Bus* 7-24 

Stop and Restart Conditions • 7-29 

V Access Mode Operands • 7-28 
IOCS' 10-6, 10-13 
ICR. 10-6, 10-13 
ICSR-7-16 
IIU.7-30 

Issue Stall • 7-30 

PC Queue and PC Loads • 7-31 
Initialization 

See Chip Initialization 
Instruction Burst Unit 

See IBU ~ 

instruction Context • 7-21 , 8-38, 9-22 
Instruction Issue Unit — 

See IIU 
Instruction Parsing • 7-17 
Instruction Queue • 9-20 
Instruction Set* 2-11 to 2-24 
INT.SYS Register • 8-30 
Internal Processor Registers 

See IPRs 
Internal Scan Register 

Cbox* 13-116 " * 

Chip* 19-4 

Ebox* 8-87 

Ibox • 7-69 

Mbox* 12-122 ~ - . 

Microsequencer * 9-26 
Interrupts * 2-33 " > 

Interrupt State Register • 1 0-9 ' 7 

Interrupt Summary • 1 0-1 0 

Interrupt Vector • 10-3 . 

Interval Timer • 10-5 

INTSYS. 10-12, 10-14 

IORESET J.'""'. 

See DEC Standard 032 
IPL » 2-34 
IPRs 

ASTLVL 

See DEC Standard 032 



IPRs (Cont.) 

BCDECC* 13-65 
BCEDECC. 13-75 
BCEDIDX* 13-74 
BCEDSTS. 13-71 
BCETAG • 13-69 
BCETIDX* 13-69 
BCETSTS • 13-66 
BC FLUSH • 13-89 
~ BCTAG • 13-87 
BPCR • 7-60 
CCTL« 13-61 
CEFADR. 13-80 
CEFSTS • 13-76 
CPUID'2-44, 18-2 18-7, 18-11 
CSRD 

See DEC Standard 032 
CSRS 

See DEC Standard 032 
CSTD 

See DEC Standard 032 
CSTS 

See DEC Standard 032 
ECR.8-81 

• ESP 

See DEC Standard 032 
Full listing • 2-52 to 2-60 
IAK14« 10-3, 10-14 
' IAK15* 10-3, 10-14 
IAK1 6» 10-3, 10-14 
IAK17* 10-3, 10-14 
ICCS* 10-6, 10-13 
ICR. 10-6, 10-13 
ICSR . 7-16 
INTSYS- 10-12, 10-14 
IORESET 

See DEC Standard 032 
IPL • 2-34 
ISP 

See DEC Standard 032 

- -KSP 

See DEC Standard 032 
... MAPEN • 2-25 

MCESR. 15-22 ' -- 

MMAPEN • 12-40 
MMEADR. 12-41 
. MMEPTE • 12-41 

MMESTS • 12-41 , 1 2-95 
MP0BR. 12-38 
MP0LR* 12-39, 12-47 
MP1 BR« 12-39, 12-47 
MP1LR. 12-39, 12-47 
MSBR' 12-39 
MSLR • 12-40, 12-47 
MTBPTE- 12-54 
MTBTAG . 12-52 
NEDATHl • 13-86 
NEDATLO. 13-86 
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IPRs (Cont.) 

NEICMD • 13-85 
NEOADR • 13-83 
NEOCMD • 13-84 
NESTS • 13-81 
NICR • 10-6, 10-13 
POBR • 2-29 
P1 BR '2-30 
P1LR • 2-30 - 
PAMODE*2-4, 12-40 
PCADR • 1 2-43 
PCBB • 2-46 
PCCTL* 12-44, 12-71 
PCDAP • 1 2-46 
PCSCR • 8-80 
PCSTS- 12-43, 12-107 
PCTAG* 12-45 
PME*18-7 
PMFCNT* 18-8 
RXCS 

See DEC Standard 032 
RXDB 

See DEC Standard 032 
SAVPC*2-40, 15-19 
SAVPSL • 2-40, 15-19- 
SBR-2-26 
SCBB • 2-41 
SID • 2-44 
SIRR* 2-35, 10-13 
SISR* 2-35, 10^13 ; 
SLR • 2-26 
SSP 

See DEC Standard 032 
TBADR • 12-42 
TBCHK 

See DEC Standard 032 
TBIA* 2-25, 12-55 
TBIS-2-25, 12-54 
TBSTS • 1 2-42, 1 2-1 06 ~ 
TODR 

See DEC Standard 032 
TXCS 

See DEC Standard 032 
TXDB 

See DEC Standard 032 

USP 

See DEC Standard 032 
VAER 

See DEC Standard 032 " J 
VDATA-7-15 
VMAC 

See DEC Standard 032 
VMAR*7-14 
VPSR 

See DEC Standard 032 
VTAG • 7-15 
VTBIA 

See DEC Standard 032 



ISP 

See DEC Standard 032 ~ 



J 



JTAG Test Port* 19-7 



K 



Kernel Stack Not Valid • 15-87 

Stack Frame* 15-87 
KSP 

See DEC Standard 032 



L 



LFSR 

WBUS • 8-90 



M 



Machine Check • 2-40, 15-22 
Codes -15-24 

Event Descriptions *. 1 5^33 to 1 5r47 

Parse Tree • 15-25 to 15-32 

Stack Frame* 15-22 
MA PEN • 2-25 
Mask Processing Unit 

SeeMPU 
Mbox Commands • 1 2-23 
Mbox Reference Order Restrictions • 12-25 

MCESR* 15-22 

MD Bus Rotator* 12-20 

Memory Management Probe Status Encodings • 

12-51 
Microcode Format 

Ebox • 6-1 to 6-4, 8-4 to 8-6, 9-8 

Sbox CSU • 6*4 -to -6-5 - - 

ibox iROM and Control PLAs • 6-5 to 6-7 
Microcode Restrictions 

, Ebox • 8-91 to 8-96 
Microstack • 9-22 
Microtest Fields • 8-39 
Microtraps • 9-1 3 to 9-1 8 
MMAPEN* 12-40 
MMEADR* 12-41 
MMEPTE* 12-41 
MMESTS • 12-41 , 12-95 : 
MMGT.MODE Register* 8-30 • "< y 

MP0BR* 12-33 
MP0LR* 12-39, 12-47 
MP1BR* 12-39, 12-47 
MP1LR* 12-39, 12-47 
MPU • 8-32 
MSBR* 12-39 
MSLR » 12-40, 12-47 
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MTBPTE • 12-54 
MTBTAG • 12-52 



N 



NDAL 

Arbitration • 3-1 8 
Cache Coherency • 3-54 
Clear Write Buffer • 3-55 
Clocking • 3-1 8 

Description • 3-1 5- - — - — 

Errors* 3-57 to 3-63 
Field Description • 3-27 to 3-37 
information Transfer • 3-27 
Initialization • 3-64 

Interlock Support • 3-56.. 

Interrupts • 3-55 
Terms • 3-1 7 

Transactions • 3-38 to 3-53 
NEDATHI • 13-86 
NEDATLO* 13-86 

NEICMD • 13-85 - 

NEOADR • 13-83 

NEOCMD • 13—84 

NESTS * 13-81 
NICR*10-6, 10-13 



Operand Queue Unit 
SeeOQU 

Operand Specifier Processing • 7-32 

OQU • 7-32 

Destination Queue • 7-32 
Destination Queue Interface • 7-37 
MD Allocation • 7-39 , ; . . 
Queue Entry Allocation • 7-38 
Source Queue • 7-32 
Source Queue interface • 7-34 



P0BR*2-29 «**-£ 
POLR • 2-29 
P1 BR •2-30 
P1LR«2-30 

Page Table Entry Format • 2-31 - 
PAMODE • 2—4, 12-40 

Parallel Port " 
Observe Cbox/Mbox • 12-123, 13-115 c~ :••£ ' - ir 
Cbox Tag Store Command • 13^H6 ■■ ?. ? ^ = 
Mbox MD Destination • 1 2-1 23 '■' 
Mbox MME State • 12-123 '--^ : 

Observe Ibox • 7-69 - 
Observe MAB • 8-87, 9-25 * ~ • 

Observe Mbox • 1 2-1 23 
S5 Command • 1 2-23 
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Parallel Port 

Observe Mbox (Cont.) 

S5 Reference Source* 12-123 -v; 
Operating Modes • 1 9-4 r . . 

Patchable Control Store •£'•<" 
Loading •9-3 (8 -ST 

Overview • 9-3 ~ s \ 

Pcache • 1 2-21 , 1 2-70 ^. 
Addressing* 12-70 gg. 
Address Redundancy Mapping • 12-77 c£ 
IPR Access • 12-75 -1 * >-S • 

Logical Organization • 1 2-70 ?^_2 ? 

Redundancy Logic* 12-77 ^ 
Replacement Algorithm* 12-74 . 
r PCADR* 12-43 

PCB*2-48 
-peBB-2-46 

PCCTL* 12-44, 12-71 

PCDAP • 12-46 

PCSCR • 8-80 

POSTS* 12-43, 12-107 

PCTAG* 12-45 : 
Performance Monitoring Facility 

Base Address • 1 8-2 :; - j? , 

_ Block Diagram • 1 8-8 

~ ' Cbox Event Selection • 13-61 , 13-118, 18*-j6 ) 
Configuring • 18-3 
Ebox Event Selection • 8-83, 18-4 
Enabling and Disabling • 1 8-6 
: l Ibox Event Selection • 7-1 6, 1 8-^4 ^ - 
' Mbox Event Selection • 12-^44, 12-126, 18-3b 
Memory Data Structure Format* 18-2 
Memory Data Structure Updates • 18-2 
PFQ*7-17 

Physical Address Space • 2-2, 12-81 - 
" *Pin Description 

r Cache Interface Pins • 3-10 to 3-12 r r , i3 j3 
*." Clicking Pins • 3-8 to 3-9 . d „ 
' Clocks* 20-22 

Interrupt and Error Pins • 3-9 to 3-10 0 _« f 
interrupts • 20-24 

"NDAL»3-4 to 3-7, 20-9 : - - 5 ,-s*3 

:: Reset -20-23 

- Test* 20-24 " .^s*c 

-Test Pins* 3-13 to 3-14 
Pih6ut * 3-1 to 3-3 
Pipeline 

^ Exceptions* 5-16 to 5-21 . v . rr , cr 

• Fundamentals* 5-1 to 5-6 
r ' Microtraps, Exceptions, and Interrupts *-fi^l1 

• NVAX Overview 5-6 to 5-11 

• - -Statts • 5-1 1 to 5-1 6, 8-1 0 
PME-18-7 
PMFCNT*18-8 

- PMFCNT Register* 8-37, 8-91 

Population Counter • 8-36 . „ . 

Power Failure Interrupts* 15-48 
: Prefetch Queue 

See'PFQ , - 
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Primary Cache 
See Pcache 
Process Control Block • 2-48 
PSL '2-5, 8-27 
PTE •12-83 



Q 



Q Register • 8-25 



Register File* 8-1 3 
Bypass* 8-14 

Valid, fault,>nd Error Bits • 8-16 
Reset 

See Chip Reset 
Result Bypass • 8-26 
Retire Queue • 8-47 
RMUX • 8-23 
RXCS 

See DEC Standard 032 
RXDB 

See DEC Standard 032 



SBR'2-26 

SBU • 7-54 . , 

SCB* 2-^41 " " . . - 

SCBB'2-41 

Scoreboard Unit 

See SBU 
SC Register* 8-29 . 
Serial-Test Port • 1 9-7 
Shifter* 8-21 

SID* 2-44 „■-.,._ 
SIRR* 2-^35, 10-13 
SISR*2-35, 10-13 
SLR • 2—26 

Soft Error Interrupts • 15-57 

Event Descriptions • 1 5-69 to 1 5-86 
Parse Tree* 15-58 to 15-69 
Stack Frame* 15-57 

Source Queue* 7-32, 8-43 

SSP 

See DEC Standard 032 
Stalls 

Ebox*8-72 to 8-77 
State Flags • 8-30 
System Control Block • 2-41 

Vector* 2-41 



S3 Stall Timeout • '8-84 
S5 Reference Packet 

Access type .*' 1 2-4 

Address* 12-4-~ 

Command* 12-4 

Data* 12-4 

Data Length •12-4 

Reference Destination* 12-4 

Reference Qualifiers • 12-5 

Tag* 12-4 

55 Reference Source 
Arbitration • 12-18, 12-28 
Cbox Latch* 12-16 

EM Latch * 12-9 
Iref Latch* 12-6. 
MME Latch * 12-12 
PA Queue* 12-17 
Retry Dmiss Latch • 12-14 
Spec Queue* 12-8 
VAPUtch* 12-11 

56 Reference Packet 
Address; 12-5 
Byte Mask* 12-5 
Command • 12-5 
Data «12-5 

Reference Destination • 12-5 
Reference Qualifiers • 12-5 

SAVPC-2-40. 15-19 

SAVPSL»2-40, 15-19 



TBADR' 
TBCHK 

See 
TBIA»2- 
TBIS • 2- 
TBSTS* 
Timeout 
TODR 

See 
TXCS 

See 
TXDB 

See 



• 12-42 

DEC Standard 032 
-25, 12-55 
-25, 12-54 
12-42, 12-106 
Counters* 13-46 

DEC Standard 032 

DEC Standard 032 

DEC Standard 032 



U 



Unaligned Reference Processing • 12—55 
USP 

See DEC Standard 032 . 



VAER 

See DEC Standard 032 
VA Register • 8-25 
VAX Restart Bit- 8-37 
VDATA»7-15 

Vector Instruction Support Limitations • 14-3 
VIBA • 7-6 
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VIC . 7-5 

Bypass • 7-9 
Control • 7-7 

Control and Error Registers - 7-14 to 7 i* 
E%STOPjBOX_H Effects . 7-1 * 
Enable. 7-13 

Exceptions and Errors • 7-10 

Fills • 7-8 

Rushing • 7-13 

Hits Under Miss • 7-10 

PC Load Effects. 7-10 

Performance Monitoring Hardware • 7-16 
Prefetch Start Conditions • 7-1 2 
Prefetch Stop Conditions. 7-12 



Reads • 7-8 
er-. : Writes • 7-9 

virtual ***<■■» Space- 2-1, 12-60 
i virtual Instruction Cache 
See VIC 

VMAC 

See DEC Standard 032 ~ 
-VMAR.7-14 

:^SR^3 

" * See DEC Standard 032 " 
«:7-15 

a^VTBlA ~ - 

See DEC Standard 032 
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